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I, L. MICHAEL FURNESS, a citizen of the United Kingdom, residing at 2 Brookside, 
Exning, Newmarket, United Kingdom, declare that: 

1 . I was employed by Incyte Genomics, Inc. (hereinafter "Incyte") as a Director of 
Pharmacogenomics until December 31 , 2001 . I am currently under contract to be a Consultant to 
Incyte. 

2. In 1984, 1 received a B.Sc.(Hons) in Biomolecular Science (Biophysics and Biochemistry) 
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from Portsmouth Polytechnic. 

From 1985-1987 I was at the School of Pharmacy in London. United Kingdom, during which 
time I analyzed lipid methyltransferase enzymes using a variety of protein analysis methods, 
including one-dimensional (ID) and two-dimensional (2D) gel electrophoresis, HPLC and a variety 
of enzymatic assay systems. 

I then worked in the Protein Structure group at the National Institute for Medical Research 
until 1989, setting up core facilities for nucleic acid synthesis and sequencing, as well as assisting in 
programs on protein kinase C inhibitors. 

After a year at Perkin Elmer-Applied Biosystems as a technical specialist, I worked at the 
Imperial Cancer Research Fund between 1990-1992, on a Eureka-funded program collaborating with 
Amersham Pharmacia in the United Kingdom and CEPH (Centre d'Etude du Polymorphisme 
Humaine) in Paris, France, to develop novel nucleic acid purification and characterization methods. 

In 1992, 1 moved to Pfizer Central Research in the United Kingdom, where I stayed until 
1998, initially setting up core DNA sequencing and then a DNA arraying facility for gene expression 
analysis in 1993. My work also included bioinformatics and I was responsible for the support of all 
Pfizer neuroscience programs in the United Kingdom. This then led me into carrying out detailed 
bioinformatics and wet lab work on the sodium channels, including antibody generation, Western and 
Northern analyses, PCR, tissue distribution studies, and sequence analyses on novel sequences 
identified. 

In 1 998, I moved to Incyte to work in the Pharmacogenomics group, looking at the 
application of genomics and proteomics to the pharmaceutical industry. In 1999, 1 was appointed 
Director of the LifeExpress Lead Program which used microarray and protein expression data to 
identify pharmacologically and toxicologically relevant mechanisms to assist in improved drug 
design and development. 

On December 12, 2001, 1 founded Nuomics Consulting, Ltd., in Exning, UK, where I am 
currently employed as Managing Director. Nuomics Consulting, Ltd. provides expert technical 
knowledge and advice to businesses in the areas of genomics, proteomics, pharmacogenomics, 
toxicogenomics, and chemogenomics. 
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3. I have reviewed the specification of a United States patent application that I understand 
was filed on September 16, 1999 in the names of Preeti Lai et al. and was assigned Serial No. 
09/397,558 (hereinafter "the Lai '558 application"). Furthermore, 1 understand that this United States 
patent application was a divisional application of, and claimed priority to, United States patent 
application Serial No. 09/083,521, filed on May 22, 1998 (hereinafter "the Lai '521 application"), 
having the identical specification. My remarks herein will therefore be directed to the Lai '521 patent 
application, and May 22, 1998, as the relevant date of filing. In broad overview, the Lai '521 
specification pertains to certain nucleotide and amino acid sequences and their use in a number of 
applications, including gene and protein expression monitoring applications that are useful in 
connection with (a) developing drugs (e.g., for the treatment of cancer), and (b) monitoring the 
activity of drugs for purposes relating to evaluating their efficacy and toxicity. 

4. I understand that (a) the Lai '558 application contains claims that are directed to isolated 
polypeptides having either of the sequences shown as SEQ ID NO: 1 and SEQ ID NO:2 (hereinafter 
"the SEQ ID NO:l and SEQ ID NO:2 polypeptides"), and (b) the Patent Examiner has rejected those 
claims on the grounds that the specification of the Lai '558 application does not disclose a specific 
and substantial asserted utility or a well established utility for the claimed SEQ ID NO:l and SEQ ID 
NO:2 polypeptides. I further understand that whether or not a patent specification discloses a specific 
and substantial asserted utility or a well established utility for its claimed subject matter is properly 
determined from the perspective of a person skilled in the art to which the specification pertains at the 
time the patent application was filed. In addition, I understand that a specific and substantial asserted 
utility or a well established utility under the patent laws must be a "real-world" utility. 

5. I have been asked (a) to consider with a view to reaching a conclusion (or conclusions) as 
to whether or not I agree with the Patent Examiner's position that the Lai '558 application and its 
parent, the Lai '521 application, do not disclose a specific and substantial "real-world" utility for the 
claimed SEQ ID NO:l and SEQ ID NO:2 polypeptides, and (b) to state and explain the bases for any 
conclusions I reach. I have been informed that, in connection with my considerations, 1 should 
determine whether or not a person skilled in the art to which the Lai '521 application pertains on May 
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22, 1998, would have concluded that the Lai '521 application disclosed, for the benefit of the public, 
a specific beneficial use of the SEQ ID NO:l and SEQ ID NO:2 polypeptides in their then available 
and disclosed forms. 1 have also been informed that, with respect to the "real-world" utility 
requirement, the Patent and Trademark Office instructs its Patent Examiners in Section 2107 of the 
Manual of Patent Examining Procedure, under the heading "I. 'Real- World Value' Requirement": 
"Many research tools such as gas chromatographs, screening assays, and 
nucleotide sequencing techniques have a clear, specific and unquestionable utility 
(e.g., they are useful in analyzing compounds). An assessment that focuses on 
whether an invention is useful only in a research setting thus does not address whether 
the specific invention is in fact 'useful' in a patent sense. Instead, Office personnel 
must distinguish between inventions that have a specifically identified substantial 
utility and inventions whose asserted utility requires further research to identify or 
reasonably confirm." 

6. I have considered the matters set forth in paragraph 5 of this Declaration and have 
concluded that, contrary to the position I understand the Patent Examiner has taken, the specification 
of the Lai '52 1 patent application disclosed to a person skilled in the art at the time of its filing a 
number of specific and substantial real-world utilities for the claimed SEQ ID NO:l and SEQ ID 
NO:2 polypeptides. More specifically, persons skilled in the art on May 22, 1 998, would have 
understood the Lai '521 application to disclose the use of the SEQ ID NO: 1 and SEQIDNO:2 
polypeptides as research tools in a number of gene and protein expression monitoring applications 
that were well-known at that time to be useful in connection with the development of drugs and the 
monitoring of the activity of such drugs. I explain the bases for reaching my conclusion in this regard 
in paragraphs 7-13 below. 

7. In reaching the conclusion stated in paragraph 6 of this Declaration, I considered (a) the 
specification of the Lai '521 application, and (b) a number of published articles and patent documents 
that evidence gene and protein expression monitoring techniques that were well-known before the 
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May 22, 1998 filing date of the Lai '521 application. The published articles and patent documents I 
considered are: 

(a) Anderson, N.L., Esquer-Blasco, R. 5 Hofmann, J.-P., Anderson, N.G., A Two-Dimensional 
Gel Database of Rat Liver Proteins Useful in Gene Regulation and Drug Effects Studies . 
Electrophoresis, 12, 907-930 (1991) (hereinafter "the Anderson 1991 article") (copy annexed at Tab 

A); 

(b) Anderson, N.L., Esquer-Blasco, R., Hofmann, J.-P., Mehues, L., Raymackers, J., Steiner, 
S., Witzmann, F., Anderson, N.G., An Updated Two-Dimensional Gel Database of Rat Liver Proteins 
Useful in Gene Regulation and Drug Effect Studies . Electrophoresis, 16, 1977-1991 (1995) 
(hereinafter "the Anderson 1995 article") (copy annexed at Tab B); 

(c) Wilkins, M.R., Sanchez, J.-C, Gooley, A.A., Appel, R.D., Humphrey-Smith, I., 
Hochstrasser, D.F., Williams, K.L., Progress with Proteome Projects: Why all Proteins Expressed by 
a Genome Should be Identified and How To Do It . Biotechnology and Genetic Engineering Reviews, 
13, 19-50 (1995) (hereinafter "the Wilkins article") (copy annexed at Tab C); 

(d) Celis, J.E., Rasmussen, H.H., Leffers, H., Madsen, P., Honore, B., Gesser, B., Dejgaard, 
K., Vandekerckhove, J.. Human Cellular Protein Patterns and their Link to Genome DNA Sequence 
Data: Usefulness of Two-Dimentional Gel Electrophoresis and Microsequencing . FASEB Journal, 5, 
2200-2208 (1991) (hereinafter 'the Celis article") (copy annexed at Tab D); 

(e) Franzen, B., Linder, S., Okuzawa, K., Kato, H., Auer, G., Nonenzvmatic Extraction of 
Cells from Clinical Tumor Material for Analysis of Gene Expression bv Two-Dimensional 
Polvacrvlamide Gel Electrophoresis . Electrophoresis, 14, 1045-1053 (1993) (hereinafter "the Franzen 
article") (copy annexed at Tab E); 

(f) Bjellqvist, B., Basse, B., Olsen, E., Celis, J.E., Reference Points for Comparisons of Two- 
Dimensional Maps of Proteins from Different Human Cell Types Defined in a pH Scale Where 
Isoelectric Points Correlate with Polypeptide Compositions , Electrophoresis, 15, 529-539(1994) 
(hereinafter "the Bjellqvist article") (copy annexed at Tab F); and 

(g) Large Scale Biology Company Info; LSB and LSP Information; from 
http://www.Isbc.com (2001) (copy annexed at Tab G). 
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8. Many of the published articles I considered (i.e., at least items (a)-(f) identified in 
paragraph 7) relate to the development of protein two-dimensional gel electrophoretic techniques for 
use in gene and protein expression monitoring applications in drug development and toxicology. As I 
will discuss below, a person skilled in the art who read the Lai '52 1 application on May 22, 1998 
would have understood that application to disclose the SEQ ID NO:l and SEQ ID NO:2 polypeptides 
to be useful for a number of gene and protein expression monitoring applications, e.g., in the use of 
two-dimensional polyacrylamide gel electrophoresis and western blot analysis of tissue samples in 
drug development and in toxicity testing. 

9. Turning more specifically to the Lai '521 specification, the SEQ ID NO:l and SEQ ID 
NO:2 polypeptides are shown at pages 5 1 -53 as two of seven sequences under the heading "Sequence 
Listing." The Lai '521 specification specifically teaches that the "invention features substantially 
purified polypeptides, prostate growth-associated membrane proteins, referred to collectively as 
'PGAMP' and individually as 'PGAMP-1 ' and 'PGAMP-2.' In one aspect, the invention provides a 
substantially purified polypeptide comprising an amino acid sequence selected from the group 
consisting of SEQ ID NO: 1 , SEQ ID NO:2, a fragment of SEQ ID NO: 1 , and a fragment of SEQ ID 
NO:2 " (Lai '521 application at page 3, lines 5-9, as amended). With respect to SEQ ID NO: 1, the 
Lai '521 specification teaches that (a) the identity of the SEQ ID NO:l polypeptide was determined 
from a "prostate cDNA library", (b) the SEQ ID NO:l polypeptide is the human prostate growth- 
associated membrane protein referred to as "PGAMP-1" and is encoded by SEQ ID NO:3, and (c) 
northern analysis shows that PGAMP-1 is expressed "in various libraries, at least 72% of which are 
immortalized or cancerous and at least 1 8% of which invlove immune response. Of particular note is 
the expression of PGAMP-1 in cancerous or hyperplastic prostate (48%) and breast (7%)" tissues and 
therefore PGAMP-1 "appears to play a role in neoplastic and reproductive disorders" (Lai '521 
application at page 13, lines 27-32; page 14, lines 10-13; and page 25, lines 15-17). With respect to 
SEQ ID NO:2, the Lai '521 specification teaches that (a) the identity of the SEQ ID NO:2 
polypeptide was determined from a "breast cDNA library", (b) the SEQ ID NO:2 polypeptide is the 
human prostate growth-associated membrane protein referred to as "PGAMP-2" and is encoded by 
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SEQ ID NO:4, and (c) northern analysis shows that PGAMP-2 is expressed fci in various libraries, at 
least 76% of which are immortalized or cancerous and at least 1 8% of which invlove immune 
response. Of particular note is the expression of PGAMP-2 in cancerous or hyperplastic prostate 
(28%) and breast (10%)" tissues and therefore PGAMP-2 "appears to play a role in neoplastic and 
reproductive disorders" (Lai '521 application at page 14, lines 14-19; page 15, lines 4-8; and page 25, 
lines 20-22). 

The Lai '521 application discusses a number of uses of the SEQ ID NO:l and SEQ ID NO:2 
polypeptides in addition to their use in gene and protein expression monitoring applications. I have 
not fully evaluated these additional uses in connection with the preparation of this Declaration and do 
not express any views in this Declaration regarding whether or not the Lai 4 521 specification 
discloses these additional uses to be substantial, specific and credible real-world utilities of the SEQ 
ID NO:l and SEQ ID NO:2 polypeptides. Consequently, my discussion in this Declaration 
concerning the Lai '521 application focuses on the portions of the application that relate to the use of 
the SEQ ID NO:l and SEQ ID NO:2 polypeptides in gene and protein expression monitoring 
applications. 

1 0. The Lai '52 1 application discloses that the polynucleotide sequences disclosed therein, 
including the polynucleotides encoding the SEQ ID NO:l and SEQ ID NO:2 polypeptides, are useful 
as probes in chip based technologies. It further teaches that the chip based technologies can be used 
"for the detection and/or quantification of nucleic acid or protein" (Lai '521 application at page 23, 
lines 5-8). 

The Lai '521 application also discloses that the SEQ ID NO:l and SEQ ID NO:2 polypeptides 
are useful in other protein expression detection technologies. The Lai '521 application states that 
"[I]mmunologicaI methods for detecting and measuring the expression of PGAMP using either 
specific polyclonal or monoclonal antibodies are known in the art. Examples of such techniques 
include enzyme-linked immunosorbent assays (ELISAs), radioimmunoassays (RIAs), and 
fluorescence activated cell sorting (FACS)" (Lai '521 application at page 23, lines 9-12). 
Furthermore, the Lai '521 application discloses that "[a] variety of protocols for measuring PGAMP, 
including ELISAs, RIAs, and FACS, are known in the art and provide a basis for diagnosing altered 
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or abnormal levels of PGAMP expression. Normal or standard values for PGAMP expression are 
established by combining body fluids or cell extracts taken from normal mammalian subjects, 
preferably human, with antibody to PGAMP under conditions suitable for complex formation" (Lai 
'521 application at page 34, lines 2-6). 

In addition, at the time of filing the Lai '521 application, it was well known in the art that 
"gene" and protein expression analyses also included two-dimensional polyacrylamide gel 
electrophoresis (2-D PAGE) technologies, which were developed during the 1980s, as exemplified by 
the Anderson 1991 and 1995 articles (Tab A and Tab B). The Anderson 1991 article teaches that a 2- 
D PAGE map has been used to connect and compare hundreds of 2-D gels of rat liver samples from a 
variety of studies including regulation of protein expression by various drugs and toxic agents (Tab A 
at p. 907). The Anderson 1991 article teaches an empirically-determined standard curve fitted to a 
series of identified proteins based upon amino acid chain length, and how that standard curve can be 
used in protein expression analysis (Tab A at p. 91 1). The Anderson 1991 article teaches that "there 
is a long-term need for a comprehensive database of liver proteins" (Tab A at p. 912). 

The Wilkins article is one of a number of documents that were published prior to the May 22, 
1998 filing date of the Lai '521 application that describes the use of the 2-D PAGE technology in a 
wide range of gene and protein expression monitoring applications, including monitoring and 
analyzing protein expression patterns in human cancer, human serum plasma proteins, and in rodent 
liver following exposure to toxins. In view of the Lai '521 application, the Wilkins article, and other 
related pre-May 1998 publications, persons skilled in the art on May 22, 1998 clearly would have 
understood the Lai '521 application to disclose the SEQ ID NO:l and SEQ ID NO:2 polypeptides to 
be useful in 2-D PAGE analyses for the development of new drugs and for monitoring the activities 
of drugs for such purposes as evaluating their efficacy and toxicity, as explained more fully in 
paragraph 1 2 below. 

With specific reference to toxicity evaluations, those of skill in the art who were working on 
drug development in May 1998 (and for many years prior to May 1998) without any doubt 
appreciated that the toxicity (or lack of toxicity) of any proposed drug they were working on was one 
of the most important criteria to be considered and evaluated in connection with the development of 
the drug. They would have understood at that time that good drugs are not only potent, they are 
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specific. This means that they have strong effects on a specific biological target and minimal effects 
on all other biological targets. Ascertaining that a candidate drug affects its intended target, and 
identifying undesirable secondary effects (i.e., toxic side effects), had been for many years among the 
main challenges in developing new drugs. The ability to determine which genes are positively 
affected by a given drug, coupled with the ability to quickly and at the earliest time possible in the 
drug development process identify drugs that are likely to be toxic because of their undesirable 
secondary effects, have enormous value in improving the efficiency of the drug discovery process, 
and are an important and essential part of the development of any new drug. In fact, the desire to 
identify and understand toxicological effects using the experimental assays described above led Dr 
Leigh Anderson to found the Large Scale Biology Corporation in 1987, in order to pursue 
commercial development of the 2-D electrophoretic protein mapping technology he had developed. 
In addition, the company focused on toxicological effects on the proteome as clearly demonstrated by 
its goals and by its senior management credentials described in company documents (see Tab G at pp. 
1,3, and 5). 

Accordingly, the teachings in the Lai '521 application, in particular regarding use of the SEQ 
ID NO:l and SEQ ID NO:2 polypeptides in differential gene and protein expression analysis (2-D 
PAGE maps) and in the development and the monitoring of the activities of drugs, clearly includes 
toxicity studies, and persons skilled in the art who read the Lai '521 application on May 22, 1998 
would have understood that to be so. 

1 1. As previously discussed {supra, paragraphs 7 and 8), in the mid-1980s the several 
publications annexed to this Declaration at Tabs A through F evidence information that was available 
to the public regarding two-dimensional polyacrylamide gel electrophoresis technology and its uses 
in drug discovery and toxicology testing before the May 22, 1998 filing date of the Lai '521 
application. In particular the Celis article stated that "protein databases are expected to foster a 
variety of biological information... among others, ... drug development and testing" (See Tab D, p. 
2200, second column). The Franzen article shows that 2-D PAGE maps were used to identify 
proteins in clinical tumor material (See Tab E). The Lai '521 application clearly discloses that 
expression of PGAMP-1 and/or PGAMP-2 is associated with immortalized cell lines, cancerous and 
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hyperplastic prostate and breast tissue, and with the immune response (Lai '521 application at page 
14, lines 10-13; page 15, lines 4-8; and page 25, lines 15-16 and 20-21). The Bjellqvist article 
showed that a protein may be identified accurately by its positional coordinates, namely molecular 
mass and isoelectric point (See Tab F). The Lai '521 application clearly disclosed SEQ ID NO:l and 
SEQ ID NO:2 from which it would have been routine for one of skill in the art to predict both the 
molecular mass and the isoelectric point using algorithms well known in the art at the time of filing. 

12. A person skilled in the art on May 22, 1998 who read the Lai '521 application, would 
understand that application to disclose the SEQ ID NO:l and SEQ ID NO:2 polypeptides to be highly 
useful in analysis of differential expression of proteins. For example, the specification of the Lai 
'521 application would have led a person skilled in the art in May 1998, who was using protein 
expression monitoring in connection with developing new drugs for the treatment of a neoplastic or 
reproductive disorder to conclude that a 2-D PAGE map that used the substantially purified SEQ ID 
NO:l and SEQ ID NO:2 polypeptides would be a highly useful tool and to request specifically that 
any 2-D PAGE map that was being used for such purposes utilize the SEQ ID NO:l and/or SEQ ID 
NO:2 polypeptides. Expressed proteins are useful for 2-D PAGE analysis in toxicology expression 
studies for a variety of reasons, particularly for purposes relating to providing controls for the 2-D 
PAGE analysis, and for identifying sequence or post-translational variants of the expressed sequences 
in response to exogenous compounds. Persons skilled in the art would appreciate that a 2-D PAGE 
map that utilized the SEQ ID NO:l and SEQ ID NO:2 polypeptide sequences would be a more useful 
tool than a 2-D PAGE map that did not utilize these protein sequences in connection with conducting 
protein expression monitoring studies on proposed (or actual) drugs for treating neoplastic and 
reproductive disorders for such purposes as evaluating their efficacy and toxicity. 

I discuss in more detail in items (a)-(b) below a number of reasons why a person skilled in the 
art, who read the Lai '521 specification in May 1998, would have concluded based on that 
specification and the state of the art at that time, that the SEQ ID NO:l and SEQ ID NO:2 
polypeptides would be highly useful tools for analysis of a 2-D PAGE map for evaluating the 
efficacy and toxicity of proposed drugs for neoplastic and reproductive disorders by means of 2-D 
PAGE maps, as well as for other evaluations. 
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(a) The Lai '521 specification contains a number of teachings that would lead persons 
skilled in the art on May 22, 1998 to conclude that a 2-D PAGE map that utilized the substantially 
purified SEQ ID NO:l and/or SEQ ID NO:2 polypeptides would be a more useful tool for gene and 
protein expression monitoring applications relating to drugs for treating neoplastic and reproductive 
disorders than a 2-D PAGE map that did not use the SEQ ID NO: 1 and/or SEQ ID NO:2 
polypeptides. Among other things, the Lai '521 specification teaches that (i) the identity of the SEQ 
ID NO:l polypeptide was determined from a prostate cDNA library, (ii) the SEQ ID NO:l 
polypeptide is the prostate growth-associated membrane protein referred to as PGAMP-1, and (iii) 
PGAMP-1 is expressed in various libraries derived from immortalized and cancerous tissues, 
cancerous or hyperplastic prostate and breast tissues, and tissues involved in the immune response, 
and, therefore, PGAMP-1 expression is "associated with neoplastic and reproductive disorders" (Lai 
'521 application at page 13, lines 27-32; page 14, lines 10-13; and page 25, lines 15-17; see 
paragraph 9, supra). Furthermore, the Lai '521 specification teaches that (i) the identity of the SEQ 
ID NO:2 polypeptide was determined from a breast cDNA library, (ii) the SEQ ID NO:2 polypeptide 
is the prostate growth-associated membrane protein referred to as PGAMP-2, and (iii) PGAMP-2 is 
expressed in various libraries derived from immortalized and cancerous tissues, cancerous or 
hyperplastic prostate and breast tissues, and tissues involved in the immune response, and, therefore, 
PGAMP-2 expression is "associated with neoplastic and reproductive disorders" (Lai '521 
application at page 14, lines 14-19; page 15, lines 4-8; and page 25, lines 20-22; see paragraph 9, 
supra). The substantially purified SEQ ID NO:) and SEQ ID NO:2 polypeptides could, therefore, be 
used as controls to more accurately gauge the expression of PGAMP in a sample, and consequently 
more accurately gauge the effect of a toxicant on expression of the gene. 

Moreover, the Lai '521 specification teaches that SEQ ID NO:l and SEQ ID NO:2 
share chemical and structural homology with known tumor-associated antigens. PGAMP-1 shares 
chemical and structural homology with rat heat-stable antigen CD4. These polypeptides share 21% 
identity and two potential transmembrane domains (Lai '521 application at page 14, lines 6-8; and 
Figure 1). In addition, PGAMP-1 has chemical similarity with CD44 antigen precursor (Lai '521 
application at page 14, lines 1-5). PGAMP-2 shares chemical and structural homology with human 
prostate-specific antigen and a fragment of the mouse apoptosis-associated tyrosine kinase, sharing 
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18% and 17% identity, respectively (Lai '521 application at page 14, lines 29-33; and Figures 2A. 
2B, and 2C). In addition, all three of these proteins share six potential transmembrane regions and a 
potential signal peptide, and PGAMP-2 and human prostate-specific antigen have similar isoelectric 
points (Lai '521 application at page 15, lines 1-2). 

(b) Persons skilled in the art on May 22, 1998 would have appreciated (i) that the 
protein expression monitoring results obtained using a 2-D PAGE map that utilized the SEQ ID NO: 1 
and/or SEQ ID NO:2 polypeptides would vary, depending on the particular drug being evaluated, and 
(ii) that such varying results would occur both with respect to the results obtained from the SEQ ID 
NO:l and/or SEQ ID NO:2 polypeptides and from the 2-D PAGE map as a whole (including all its 
other individual proteins). These kinds of varying results, depending on the identity of the drug being 
tested, in no way detract from my conclusion that persons skilled in the art on May 22, 1998, having 
read the Lai '521 specification, would specifically request that any 2-D PAGE map that was being 
used for conducting protein expression monitoring studies on drugs for treating neoplastic and 
reproductive disorders (e.g., a toxicology study or any efficacy study of the type that typically takes 
place in connection with the development of a drug) utilize the SEQ ID NO:l and/or SEQ ID NO:2 
polypeptides. Persons skilled in the art on May 22, 1998 would have wanted their 2-D PAGE map to 
utilize the SEQ ID NO:l and/or SEQ ID NO:2 polypeptides because a 2-D PAGE map that utilized 
these polypeptides (as compared to one that did not) would provide more useful results in the kind of 
gene and protein expression monitoring studies using 2-D PAGE maps that persons skilled in the art 
have been doing since well prior to May 22, 1998. 

The foregoing is not intended to be an all-inclusive explanation of all my reasons for reaching 
the conclusions stated in this paragraph 12, and in paragraph 6, supra. In my view, however, it 
provides more than sufficient reasons to justify my conclusions stated in paragraph 6 of this 
Declaration regarding the Lai '521 application disclosing to persons skilled in the art at the time of its 
filing substantial, specific and credible real-world utilities for the SEQ ID NO:l and SEQ ID NO:2 
polypeptides. 

13. Also pertinent to my considerations underlying this Declaration is the fact that the La! 
'521 disclosure regarding the uses of the SEQ ID NO:l and SEQ ID NO:2 polypeptides for protein 
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expression monitoring applications is not limited to the use of these proteins in 2-D PAGE maps. For 
one thing, the Lai '521 disclosure regarding the technique used in gene and protein expression 
monitoring applications is broad (Lai '521 application at, e.g., page 23, lines 3 to 31; and page 34, 
lines 2-10). 

In addition, the Lai '521 specification repeatedly teaches that the proteins described therein 
(including the SEQ ID NO:l and SEQ ID NO:2 polypeptides) may desirably be used in any of a 
number of long established "standard" techniques, such as ELISA or western blot analysis, for 
conducting protein expression monitoring studies. See, e.g.: 

(a) Lai '521 application at p. 23, lines 9-12 ("Immunological methods for detecting 
and measuring the expression of PGAMP using either specific polyclonal or monoclonal antibodies 
are known in the art. Examples of such techniques include enzyme-linked immunosorbent assays 
(ELISAs), radioimmunoassays (RIAs), and fluorescence activated cell sorting (FACS)"); and 

(b) Lai '521 application at p. 34, lines 2-10 ("A variety of protocols for measuring 
PGAMP, including ELISAs, RIAs, and FACS, are known in the art and provide a basis for 
diagnosing altered or abnormal levels of PGAMP expression. Normal or standard values for PGAMP 
expression are established by combining body fluids or cell extracts taken from normal mammalian 
subjects, preferably human, with antibody to PGAMP under conditions suitable for complex 
formation[.] The amount of standard complex formation may be quantified by various methods, 
preferably by photometric means. Quantities of PGAMP expressed in subject, control, and disease 
samples from biopsied tissues are compared with the standard values. Deviation between standard 
and subject values establishes the parameters for diagnosing disease"). 

Thus, a person skilled in the art on May 22, 1998, who read the Lai '521 specification, would 
have routinely and readily appreciated that the SEQ ID NO:l and SEQ ID NO:2 polypeptides, 
disclosed therein, would be useful to conduct gene and protein expression monitoring analyses using 
2-D PAGE mapping or western blot analysis or any of the other traditional membrane-based protein 
expression monitoring techniques that were known and in common use many years prior to the filing 
of the Lai '52 1 application. For example, a person skilled in the art in May 1 998 would have 
routinely and readily appreciated that the SEQ ID NO:l and SEQ ID NO:2 polypeptides would be 
useful tools in conducting protein expression analyses, using the 2-D PAGE mapping or western 
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analysis techniques, in furtherance of (a) the development of drugs for the treatment of neoplastic and 
reproductive disorders, and (b) analyses of the efficacy and toxicity of such drugs. 
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14. I declare further that all statements made herein of my own knowledge are true and that 
all statements made herein on information and belief are believed to be true; and further, that these 
statements were made with the knowledge that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, and that willful false statements may jeopardize the 
validity of this application and any patent issuing thereon. 




L. Michael Furness, B.Sc. 



Signed at Exning, United Kingdom 
this 8 th day of February, 2002. 
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A comparison of selected mRNA and protein 
abundances in human liver 



'Large Scale Biology Corporation. In order 10 obtain an estimate of the overall level of correlation between 
Rockville, MD. LSA mRNA and protein abundances for a well-characterized pharmaceutical rele- 

incyie Pharmaceuticals. Palo Alto. vant biological system, we have analyzed human liver by quantitative two- 
CA - LSA dimensional electrophoresis (for protein abundances) and b> Transcript Image 

methodology (for mRNA abundances). Incyte's LifeSeq database was sear- 
ched for expressed sequence tag (EST) sequences corresponding to a series of 
23 proteins identified on 2-D maps in the Large Scale Biology (LSB) Molec- 
ular Anatomy" database, resulting in estimated abundances for 19 messages 
(4 were undetected) among 7926 liver clones sequenced. A correlation coeffi- 
cient of 0.48 was obtained between the mRNA and protein abundances deter- 
mined by the two approaches, suggesting that post-transcriptional regulation 
of gene expression is a frequent phenomenon in higher organisms.* A com- 
parison with published data (Kawamoto. S.. a oL. Gene ]996. 151-15S) on 
the abundances of liver mRNAs for plasma proteins (secreted by the liver) 
suggests that higher abundance messages are strongly enriched in secreted 
sequences. Our data confirms this: of the 50 most abundant liver mRNAs. 29 
coded for secreted proteins, while none of the 50 most abundant proteins 
appeared to be secreted products (although four plasma and red blood cell 
proteins were present in this group as contaminants from tissue blood). 



1 Introduction 

The control of gene expression is achieved by a series 
of complex mechanisms which can be divided into 
two basic phases. The first phase, which involves the 
processing of sequence information from DNA. through 
transcription. RNA splicing, and transport through the 
nuclear membrane to yield a mature mRNA. has been 
relatively well characterized for many genes through 
nucleic acid sequencing approaches. The second phase, 
involving translation into protein (dependent on mRNA 
translatability). folding, assembly into multimers, trans- 
port to an appropriate subcellular location, post-transla- 
tional modifications, and final destruction, has been less 
comprehensively characterized. Both phases are likely to 
contain important control points associated with gene 
regulation underlying differentiation, disease processes 
and drug effects. For a variety of reasons, it would be 
useful to know the extent to which mRNA abundances 
are predictive of corresponding protein abundances. A 
series of powerful methodologies, including Transcript 
Imaging (1). SAGE [2], differential display [3] and array 
hybridization [4-6]. have been developed to detect and 
in some cases quantitate differences in mRNA composi- 
tion between different samples. In parallel, high resolu- 
tion protein mapping systems, based on two-dimen- 
sional (2-D) electrophoresis [7], have been employed to 
build quantitative databases describing gene expression 
at the protein level [8-11]. By combining these ap- 
proaches, it is possible for the first time to examine both 
levels at which gene expression is controlled, and 
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thereby to develop a global understanding of gene 
expression control. 

To date, we are aware of surprisingly little published 
work on the overall relationship of message and protein 
abundance, with the exception of a recent study by 
Kawamoto cr of. [12]. comparing mRNA levels obtained 
for plasma protein genes by transcript image methodol- 
ogy with the abundances of the corresponding plasma 
proteins in circulation. This report appeared to show a 
strong correlation between mRNA and protein abun- 
dance, based on data for nine human gene products. It 
seemed likely, however, that such secreted proteins con- 
stitute a special case, since they are rapidly delivered 
from the cell of synthesis to the plasma compartment, 
where many of the mechanisms that regulate cellular 
protein abundance arc presumably absent. We therefore 
decided to compare mRNA and protein levels for a 
larger series of cellular molecules in order to see 
whether a simple relationship exists between mRNA and 
protein abundance for this class, and to see whether 
mRNAs for major cellular proteins are generally more or 
less abundant than those for major secreted products. 



2 Materials and methods 

Samples for 2-D electrophoresis were prepared by 
rapidly mixing a frozen powder of human liver (prepared 
and stored at liquid nitrogen temperature in the 
National Biomonitoring Specimen Bank at the US 
National Institute of Standards and Technology) with an 
8-fold excess of 9 m urea, 2% NP-40, 1% mercapto- 
ethanol and 2% carrier ampholytes (LKB 9-11). Ten uL 
of the resulting sample was analyzed using the Iso- 
DALT 2-D electrophoresis system, and the gels stained 
with colloidal Coomassie Brilliant Blue (CBB) G-250 as 
previously described [13-16]. Each stained slab gel was 
digitized in red light at 134 urn resolution using an 
Eikonix 1412 scanner and the digitized gel images pro- 
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Tabltr 1. Protein and mRNA abundances in human liver rcponed for 2} selected molecules 
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a) Protein abundance » given in pixel-gray levels (the integrated CBB optical density of the appropriate spot or mkm> on a 2-1) gel), where mul- 
tiple spots comprising a single gene product have been summed. Messenger RNA measurements arc given ;i> a percentage ol the total 
number of clones sequenced in the relevant transcript images. 



cessed using ihe Kepler' software system (Large Scale 
Biology) to give protein abundances in terms of pixel X 
gray-level values, as well as group average abundances 
and standard deviations over a set of seven male human 
livers. Relative abundances were computed by dividing 
individual average abundances by the average total abun- 
dance of the proteins resolved on the gels. A series of 
proteins was identified on these gels based on close 
homology with identified rodent liver spots and on iden- 
tifications published by Hughes et al. [17]. Total cellular 
RNA was extracted from samples of human liver tissue 
by the method of Chirgwin et al. [18], and poly-A+ RNA 
was prepared by hybridization to oligo-dT cellulose. Five 
ug of poly-A- RNA was used to construct a cDNA 
library using the Gubler and Hoffman method [19] in 
bacteriophage-lambda UNIZap' v (Stratagene Inc., La 
Jolla. CA). The library was converted to piasmid DNA by 
bulk excision, and individual colonies were selected for 
DNA template preps. The templates were sequenced 
enzymatically (Sanger et al. [20]) on an AB1 373 auto- 
mated DNA sequencer. Templates considered sequenced 
sucessfully contained > 230 bases of cDNA insert 
sequence after removal of repetitive and low information 
sequences, > 90°/o base call accuracy, and were not of 
mitochondrial, vector or host origin. Resulting DNA 
sequences were analyzed using the BLAST program for 
similarity with other known primate, mammalian, and 
subsequently all divisions of GenBank. Similarity data 
was stored and tabulated in the LifeSeq 1 " software 
(Incyte, Palo Alto. CA). from which relative fractions of 
specific gene products present within the starting RNA 



prep were calculated as follows: % abundance - # clones 
representing each gene / total # of genes sampled *1()0. 
A total of 7925 clones were sequenced from liver ob- 
tained from two individuals: one male (5054 clones) and 
one female (2871 clones). Data from Table 1 of Kawa- 
moto et al. [12]. was replotted using protein abundances 
for human plasma proteins taken as mean values of the 
range presented in reference [21]. An error in the. abun- 
dance of the haptoglobin uls polypeptide (which was 
assumed in [12] to account for the entire abundance of 
the haptoglobin ct : |3 : tetramer) was corrected. 



3 Results 

Protein and mRNA abundance data were collected for a 
set of gene products identified on 2-D gels (Table 1). 
Standard deviations of the protein measurements across 
six individual livers were relatively low, averaging 19% of 
the mean abundance. Of the 23 selected proteins, 
mRNAs for 19 were detected in human liver transcript 
images. Of these 19, five were represented by 1 clone, 
three by 2 clones, four by 3 clones, and the rest by 
between 4 and 17 clones. Of the four gene products 
undetected at the mRNA level, one (cytochrome oxidase 
subunit II: COX-II) was deleted from the Transript 
Image dataset during standard initial sequence data 
workup, which removes all mitochondrial sequences. A 
plot of protein abundance (expressed as integrated Coo- 
massie Blue absorbance averaged over seven individual 
livers) versus mRNA abundance (expressed as per- 
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Table I. 
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mRNAs and proteins are plotted as a percentage of total detected mol- 
ecules on a lo£ scale. Message and protein point* at the same rank are 
not, in general, product* of the same gene. 
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dances ai the RNA and proiein levels are inverted (3 
actin is the more abundant protein, while y actin has the 
more abundant message), and the niRNA:protcin ratios 
for the two genes differ by more than a factor of two. 
Carbamyl phosphate synthase (CPS), the most abundant 
protein detected in liver over the p/ range of conven- 
tional 2-D gels (pH -4-7). had a relative abundance of 
2.83% (protein) and yet comprised only 0.139% of the 
total message (less than either actin). In this case, the 
mature protein is sequestered inside the mitochondrion, 
and therefore might be expected to show slow turnover 
and a consequent large disparity between jnRNA and 
protein abundance. 



Ftvurc _\ A log-log plot of data on mRNA abundance taken from 
Kawamoto ct ai. |12| versus average protein abundances in plasma 
taken from [21]. The protein abundance value for the haptoglobin els 
polypeptide has been corrected to reflect the fact that this subunii 
accounts for onl> 21% of the mass of the haptoglobin tetramer. 

centage of total cDNA clones in the transcript images of 
two livers) indicates a modest correlation between the 
two (Fig. 1). The Pearson product-moment correlation 
coefficient obtained from the 19 pairs of measurements 
is 0.48. The abundance values obtained at the protein 
level spanned a 70-fold range, while the delectable 
mRNA abundances spanned a 16-fold range for these 
genes (although the latter value may reflect the limited 
number of clones sequenced). One particularly inter- 
esting subset of measurements concerns the f} and y 
aciins. Here the mRNA abundances are, respectively, 
0.189% and 0.215%. whereas the protein abundances are, 
respectively, 1.41 % and 0.65% of the total. In this compa- 
rison, both sets of measurements are likely to be quite 
accurate, since numerous clones were detected for each 
of the two messages, and since the two proteins are so 
homologous, and have such close p/s, that they should 
bind CBB similarly. Nevertheless, the relative abun- 



A reexamination (Fig. 2) of the data of [12] on genes for 
plasma proteins, using estimates for corresponding pro- 
tein abundances revised to account for the a : 3 : structure 
of haptoglobin, showed a higher correlation coefficient 
between mRNA and protein abundance (0.96). This 
value is probably exaggerated due to the large separation 
of the albumin values from the rest of the data: if 
albumin is omitted from the calculation, the correlation 
coefficient drops to —0.19. However, it is clear that the 
plasma proteins are represented by many more mRNA 
copies than major cellular proteins: albumin, for 
example, accounts for about 14% of the total number of 
clones examined [12). with a number of other plasma 
proteins accounting for more than 1 % of the total each. 
By contrast, none of the cellular proteins chosen from 
the 2-D gel data accounted for much more than 0.1% of 
the mRNAs sequenced. To further pursue this observa- 
tion, we compared the relative abundance distributions 
of the 100 top-ranked (most abundant) mRNAs and pro- 
teins in our data sets (Fig. 3). Forty-one of the top 100 
mRNAs, and 29 of the top 50, coded for proteins known, 
or expected: from sequence to be secreted from the liver, 
while none of the top 100 proteins appeared to be secre- 
tory' forms of the human plasma proteins. The two most 
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abundant proteins in these samples (hemoglobin 0 and 
albumin) as well as two of lower abundance <c : antipro- 
tease and transferrin) were blood proteins that consti- 
tute contaminants of the liver in this context-proteins 
which would have been removed by perfusion. 



4 Discussion 

Despite extensive work on the regulation of many indi- 
vidual genes, little attention appears to have been paid 
to the global question of the relation between mRNA 
and corresponding protein abundance in eukaryotes. We 
have attempted to provide an initial estimate of the rela- 
tionship of mRNA and corresponding cellular protein 
abundances through use of correspondences between 
two databases: the Molecular Anatomy" (2-D gel) and 
LifeSeq" (Transcript Image) databases of human liver. 
Using a panel of 23 proteins identified on 2-D gels of 
human liver, we searched LifeSeq" to determine the 
number of clones matching the corresponding gene 
sequence by BLAST. Matches were found for 19 pro- 
teins, and the correlation coefficient obtained over this 
set of data was 0.48. This number is intriguingly close to 
the middle position between a perfect correlation (1.0) 
and no correlation whatever (0.0). One simple interpreta- 
tion of such a value is that the two major phases of gene 
expression regulation (transcription through message 
degradation on the one hand, and translation through 
protein degradation on the other) are of approximately 
equal importance in determining the net output of func- 
tional gene product (protein). Several issues may limit 
the quantitative accuracy of this result. First, the protein 
measurements rely on CBB binding to a series of dif- 
ferent proteins. Although the measurements obtained 
show good (low) standard deviations across a set of six 
individual livers, it is well known that different proteins 
can bind CBB with different affinities. Thus the measure- 
ment scale for one protein may differ from another by 
up to approximately twofold. Since, however, these rela- 
tive scale errors should be normally distributed, we 
expect them to have little effect on the overall correla- 
tion. Precision of the mRNA measurements is also 
limited, in this case because a limited number of clones 
was detected for the selected proteins. Five genes, for 
example, were represented by only one clone each 
among the 7925 clones sequenced from the respective 
cDNA tissue libraries. This low relative expression at the 
mRNA level is expected, since a majority of the high 
abundance mRNAs in liver code for plasma proteins. 
However, such small numbers of clones lead to poten- 
tially large quantitative errors because of sampling error. 
Here again, we believe these errors should be relatively 
random across the set of proteins chosen, and thus 
should not skew the result appreciably. A third potential 
difficulty is that the databases used for the protein and 
mRNA abundance estimates were prepared from dif- 
ferent samples. In future, it will thus be of great interest 
to repeat the experiment using the same samples to 
examine both mRNA and protein abundances. 

Despite these potential sources of error, at least one 
homologous pair of proteins (the 0 and y actins) shows 
persuasive evidence of post-transcriptional regulation, 



with mRNA-io-protein ratios differing by more than a 
factor of two between the two genes. This is a particu- 
larly striking case since the two proteins are essentially 
indistinguishable in function (apart from affinitiy for 
MgADP: 22). have very similar sequences, and are pro- 
duced in a constant ratio (approximately 2:1 in males) in 
virtually all cell types. One possible alternative explana- 
tion could be a sex difference in liver expression of y 
actin. as is seen in rodents [23] where y actin protein 
expression averages almost twice as high in females as 
males. This seems unlikely since 64 u o of clones in the 
RTI data were from male liver, and all the 2-D data was 
from male livers. 

An analogous set of data for plasma proteins secreted by 
the liver has been published by Kawamoto ct uL |12] and 
we have reanalyzed their values to see whether a similar 
mRNA-to-protein relationship holds, h appears, based 
on nine plasma proteins, that a higher correlation coeffi- 
cient applies: 0.96. This result is less convincing, how- 
ever, because one gene product (albumin) is well-sep- 
arated from the cluster of the remaining eight, and thus 
exercises a disproportionate influence on the correlation 
coefficient. In fact, if albumin is omitted from the calcu- 
lation, the correlation coefficient is reduced to — 0. \9, 
which suggests a very poor correlation. 

What is perhaps more striking is the relatively much 
higher abundance of the plasma protein mRNAs as 
compared to major cellular proteins such as carbamyl 
phosphate synthase, the actins. or cytochrome b5. Mid- 
abundance plasma proteins were represented by mRNAs 
having approximately 100-fold higher relative abundance 
than mid-abundance cellular proteins. This result is veri- 
fied by a direct comparison of the relative abundance 
distributions of the 100 top-ranked mRNAs and proteins 
in our data sets (which are. in general. different sets of 
genes). Twenty-nine of the top 50 messages are secreted 
products, while none of the top 50 proteins appear to be 
the pro- form of a secreted molecule. Such a conclusion 
is not surprising, since the liver is responsible foT gene- 
rating high protein concentrations in the relatively large 
plasma compartment of the body, but does so by means 
of closely coupled synthesis and secretion with little 
accumulation of precursor proteins in process. This 
points to a potentially significant difference in the pic- 
tures obtained from mRNA and protein abundance data- 
bases. Major secreted proteins appear to have much 
more abundant mRNAs than many important cellular 
proteins, and hence mRNA abundance databases that 
concentrate on a small number of the highest abun- 
dance messages may be biased towards secreted proteins 
over cellular molecules. This represents an advantage of 
the mRNA approach relative to protein databases in the 
search for novel cytokines and other secreted proteins, 
but a disadvantage in the characterization of cellular 
metabolic and control processes. Additionally, it suggests 
that mRNAs for secreted proteins may have, on the 
whole, shorter half-lives than mRNAs for cellular 
enzymes, the latter being more frequently regulated at 
the translational level. 

We also found important differences in the overall 
shapes of the relative abundance distributions of the 100 



iop-ranked mRNAs and proteins. While both distribu- 
. lions contain a few very high abundance molecules (in 
the 3-10% range) they appear to diverge significantly 
below the 15th most abundant gene product, with pro- 
teins 16-100 accounting for roughly twice as high a rela- 
tive abundance as the 16th— 100th mRNAs. Not all pro- 
teins are represented on the 2-D gels used here (which 
fail to resolve proteins with p/ >7). but the estimated 
40°/o of proteins thus excluded would not affect the 
shape of the distribution over positions 50-100 signifi- 
cantly if they have an abundance distribution similar to 
the p/ 4-7 proteins (based on a simulation using the 
data shown). The mRNA abundance distribution covers 
all cloned messages (not a subset of genes), and for 
' abundant mRNAs it should be complete as it stands. 
Altogether, the top 100 mRNAs comprise 51.3% of the 
total clones, while the top 100 proteins comprise 63.1% 
of the total protein detected. Hence it appears likely that 
the distribution of protein abundances is significantly dif- 
ferent from that of mRNAs. showing a more gradual fall- 
ofT in the region examined, and that techniques able to 
delect down to a specified percent abundance threshold 
would reveal more proteins at a given threshold than 
mRNAs. As the protein and nucleic acid databases 
expand, we anticipate the possibility of generating suc- 
cessively more robust estimates of the global relation- 
ship between mRNA and proiein abundance, and thus a 
better understanding of multi-level gene expression con- 
trol in complex organisms such as man. 

Human liver samples analyzed by 2-D electrophoresis were 
kindly provided by the National Biomonitoring Specimen 
Bank at the US National institute of Standards and Tech- 
nology under the direction of Dr. Stephen Wise. 
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An updated two-dimensional gel database of rat liver 
proteins useful in gene regulation and drug effect 
studies 

We have improved upon the reference two-dimensional (2-D) electrophoretic 
map of rat liver proteins originally published in 1991 (N. L. Anderson et at.. 
Electrophoresis 1991. 12. 907-930). A total of 53 proteins (102 spots) are now 
identified, many by microsequencing. In most cases, spots cut from wet. Coo- 
massie Blue stained *2-D gels were submitted to internal tryptic digestion [2 J. 
and individual peptides, separated by high-performance liquid chromatography 
(HPLC). were sequenced using a Perkin-Elmer 477a sequenator. Additional 
spots were identified using specific antibodies. 



Figure 1 shows the current annotated 2-D map of F344 
rat liver, analyzed using the Iso-DALT svstem (20 X 25 
cm gels) and BDH 4-8 carrier ampholytes. Both the 
map itself and the master spot number system remain 
the same as shown in the original publication. Table 1 
lists the imponant features of each identification shown, 
including the gel position, pi. and M, for the most 
abundant or most basic form of each protein. Using this 
extended base of identified spots, a series of four 
improved calibration functions has been derived for the 
pi and SDS-A/, axes (the first two of which are shown in 
Fig. 2A and B). Both forward and reverse functions are 
derived, so that one can compute the physical properties 
of a spot with a given gel location, or inversely compute 
the gel position expected for a protein having given 
physical properties: 



)ft4TLIVER = /vi-RULIvm » f-*f.S£OVENe£.DE*lv£D) 
<**ATUVE* — J^I-RAT LIVER X (P'sEOL tNCE-DE RA Ed) 
^rGEL-DERIVTD = AaTLTOR Y-M r < iftATLIVEfc) 
P^CEL-DERfVEO = AaTUVER X-,1 (^RaTLIVEr) 



(1) 
(2) 

(3) 
(4) 

A spreadsheet program (in Microsoft Excel) was devel- 
oped to facilitate flexible computation of pfs from 
amino acid sequence data, and the results were entered 
into a relational database (Microsoft Access). A table of 
spot positions and sequence-derived pi's and Af/s was 
fitted with a large series of analytic equations using 
Tablecurve (Jandel Scientific), and the four conversion 
Eqs. (1M4), relating computed p/ and gel X coordinate, 
or computed molecular weight and gel Y coordinate, 
were selected, based on criteria of simplicity, goodness 
of fit and favorable asymptotic behavior. Table 2 lists the 
equations and coefficients. Application of Eqs. (3) and 
(4) to a spot's X and Y coordinates, given in [1], produce 
improved M, estimates, and allow computation of p/ 

Correspondence: Dr. Leigh Anderson. Urge Scale Bioloev Corpora- 
tion, 9620 Medical Center Drive. Rockville. MD 20850-33JS USA fTel- 
+30M24-59I9; Fax: +301-762.4892; email: leigh©l,bc com) 

Keywords: Two-dimensional polyacrylamide id electrophoresis / Liver 
/ Map / Identification / Calibration 



directly in pH units, instead of in terms of positions rela- 
tive to creatine phosphokinase (CPK) charge standards. 
The inverse Eqs. (1) and (2) were used to compute the 
gel positions of a series of p/ and M, tick marks. These 
tick marks were plotted with SigmaPlot (Jandel), 
together with fiducial marks locating several prominent 
spots, and the resulting graphic was aligned over the syn- 
thetic gel image (computed by Kepler from the master 
gel pattern) using Freelance (Lotus Development). Maps 
were printed as Postscript output from Freelance, either 
in black and white (as shown here) or in color, where 
label color indicates subcellular location (available from 
the first author upon request). We have also used the rat 
liver 2-D pattern as presented here to calibrate the pat- 
terns of other samples. Using mixtures of rat liver and 
mouse liver samples, for example, we made composite 
2-D patterns that allow use of the rat pattern to standar- 
dize both axes of the mouse pattern. This was accompli- 
shed by deriving transformations relating the rat and 
mouse X, and separately the rat and mouse Y % axes 
(Table 2, lower half; Fig. 2C and D) based on a series of 
spots that coelectrophorese in these closely related spe- 
cies. These functions were then applied to derive eqiia- : 
tions relating the mouse liver X and Y\o p/and SDS-A/, 
(Eqs. 5 and 6 below). The resulting standardized 2-D pat- 
tern for B6C3F1 mouse liver is shown in Fig. 3. 



Af. 



fMOL'Jt LIVER — A AT LIVER Y— Mr OmOUSE LIV£R Y-RaT LIVER Y 

(^MOUSE LIVE*)) (5) 



MOUSE LIVER — AaT LIVER X-»l VmOUSE LIVER X-RaTLTVER X 



(/mousi 



MOUSE LIVE 



a)) 



(6) 



A slightly more complex approach can be used to stand- 
ardize samples that have few or no spots co-electropho- 
resing with rat liver proteins. In this case, a 2-D gel is 
prepared with a mixture of the two samples, and four 
functions (forward and backward, each for X and Y) are 
derived relating each sample's own master pattern to the 
composite. The required functions are then applied in a 
nested fashion to yield the desired result (using rat 
plasma as an example): 

^ffRATMASMA *" AaTUVER Y-M, (/rAT PLASMA ♦ LIVER V-RATUVER y 

C/raT PLASMA Y-RAT PLASMA* UVta Y ( ^RaTPLaSMa))) 

a) 
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/Vgiw /. Master 2-D |cl pattern or Fischer 344 rat liver proteins, annotated with 53 protein identifications and computed p/ and M 
Tentative identifications are in italic type. p/ mo «, 



Table 1. Proteins identified in the 2-D pattern of F344 rat liver 



MSN** 



Protein ID*' 



Protein name 



Identification comments 



Gel T" Experimental Gel Y* 1 Experimental 



126 



HADO-HUMaN* 



137. 159. 288. DIDH.RAT 
258 



173 
38 
68 
693 

28. 21, 33 

43 
96 
117 

1163. 1161, 
1162,20 
1S5 
123 



m up. rat 
actb.hu man 
actg. human 

AFAR.RAT 

ALBU.RAT 

DHAM.RAT 
ARGI.RAT 
SUAR.RAT 
GR78.RAT 

CAH3.RAT 
CALM.HUMAK 



3,201. 41, 39, CRTCJIAT 
22.24 



3-RV3.4-DO: 3-hydroxy- 
aathrinilate-3.4-dioxy- 
genase 

3HDD: 3-hydroryjtcroid 
dihydrodiol reductase 

i;u globulin 

Actio 0 
Actin y 

Anatoxin Bl aldehyde 

reductase 
Albumin 

Aldehyde dehydrogenase 
Arginase 

AryUulfoiransferase 
BIP (GRP-78) 

CA-III 
Calmodulin 

Calreticulin 



Internal sequence 


871.95 


5J6 


921.35 


30 207 


Ab (T.M. Penning) and pure protein 


1857.52 


6-51 


122.52 


34 406 


Presence in liver microsome lumen. 


919.16 


5.43 


1313.81 


19 549 


abundance in kidney, p/. M, 








Analogy with other mammalian patterns 


763.40 


5.19 


693.64 


41 5S6 


Ux human) through coelectrophoresis 








Analogy with other mammalian patterns 


779.42 


52} 


692J6 


41 677 


(e.g. human) through ^electrophoresis 










Internal sequence 


1993J2 


6.72 


818.60 


34 593 


Coelectrophoresis with principal plasma 


1262.81 


5.16 


445.64 


66 354 


protein 










A-Terminal sequence and AAA 


1317.72 


5.91 


589.03 


49 602 


Internal sequence 


1730.72 


6J4 


756.02 


37 119 


Internal sequence 


1547.96 


6.14 


849.08 


33 186 


Ah (F. Wiumann) 


665 J3 


5.01 


397J9 


74 564 


Uncertain; by comparison with mouse 


1996.60 


672 


,1017.02 


26 8J7 


Analogy with human cellular patterns 


23.05 


4.03 


1433.25 


17 419 


through coeleorophoresis 










Ab (Lance Pohl) 


310J9 


4J4 


433.10 


68 206 



UttunjHttti* lm. 76. 1977-1111 
Table 1. continued 



3-D Dauhuc of nx Uvtr proteins 



1979 



MSN*' 



Protein IDb) Protein name 



Ideatificttiofi comments 



Gel Expenmenui Gel f" Experimental 



1184. 1186, CPSM.RAT 
114. 174. Ill 
3. 167, 157 



54. 61 


CaTa.RAT 


136 


C0X2.RAT 


87 


cyb5 rat 


41 


ck-rat" 


29 


CK-RAT 0 


5. 11 


ENPl-RAT 


60 


ENOA.RAT 


27 


ER60.RAT 


17 


atpb.rat 


196 


ATP7.RAT 


79 


F16P.RAT 


62.78 


DHE3.RAT 


125 


HAST- RAT" 


307 


hoi. rat 


415. 1250. 


HMCS.RAT 


933 




133. 144. 235 


HMCS.RAT 


8. 23. 1307 


HS7C.RAT 


15. 25. 1)0 


P60.RAT 


971 


HS70-RAT ,> 


1216. 1215. 90 HS90-RAT" 



256 



INCI-MUMAN 



415. 734 LAMB-RAT' 



SO 

227 

134 



LAMR-RaT*' 
FABL.RAT 



MDHC.MOUS 
E 

18. 35.226 CR75-RAT*' 

175. 251 NCPR.RAT 

1168. 1170. PDI.RAT 
1171 

47. 93 ALBU.RAT 

236 APAI.RAT 
320 IPKI.BOVIN 



Carbarny) phosphate 
synthase 

Cs ultsr 
COX-II 

Cytochrome B5 

Cyiokeraun 
Crtokeratin 
Endoplasmin 
Eaolasc A 
ER-60 

Fl ATPase fi 
Fl ATPase 6 

Fructose- 1 .6- b ts-pbospbause 

Gtuumaie dehydrogenase 
HAST-I: N-bydroxyaryl- 
amine sulfouaasfense 
Heme oxygenase 1 

HMG CoA synthase. 

cyiosotic 
HMC CoA synthase. 

mitochondria] ffragl 
HSC-70 

HSP-60 

HSP-70 
HSP-90 

Interferon-? induced 
protein 

T-aminin receptor" . 
L-FABP Oiver farry acid 

binding protein) 
Mailt* dehydrogenase 

MitconJ; grp75 

NADPH P450 reduoase 
PDI: Protein disulfide 

isomerase 
Pro-Albumin 



3-D of pure protein; comfirmed by 1 453 J 6 6.05 
A-termiuI sequence tad AAA 

Interna) sequence 2000.81 6.73 

Ah (J. W. Taaaman). confirmed by 45157 4.61 

miernaJ sequence 

2-D of pure protein; Ab; confirmed 515.68 4.73 

by AAA 

Location is cytoskeleta! fraction 1165.12 5.75 

Location in cytoskeleta) fraction 743.11 5.15 

Ab (F. Wiizmann) 567.73 4.83 

Interna) sequence and AAA 1399.78 6.00 

A-TerminaJ sequence OL M. Van Franx) 1184 JO 5.77 

//-Terminal sequence and AAA 629.06 4.95 

Internal sequence 1227.24 5.83 

Uncertain; by comparison with ID in 92434 5.44 
Garrison and Wager (JBC 257:13135-13143) 
N-TcnninaJ sequence and internal sequence 1887 J 9 635 

Internal sequence 1297.94 549 

Uncertain; tradable data from internal 12)939 5.81 
sequence 

Ah (3. Cenaenhausea) 1033.48 539 

Ah (i. Germenbausea). terminal 666.40 5.02 

sequence (Stetner/Lottspeich) 

Positional homology (with bumin. etc.) 8U.8T 5.27 

through coelectro phoresis 

Ab (F. Witzman); confirmed by N-terminal 845.09 532 

sequence and AAA 

Ab(F. Wurman) 976.11 531 

Ab (F. Wurman) 659.86 5.00 

Inuma) sequence 993.85 534 

Positional homology with human through 737.10 5.14 

^electrophoresis, nuclear location 

Internal sequence 534.02 4.77 

Ah (N.M.Bass) 1586.09 6.18 

Interna! sequence 1270.85 5.86 

Positional homology with human through 905.67 5.41 

coelectropnofcsts 

2-D of pure protein 824.69 539 

tf-Tenninal sequence (R. M. van Frank). Ab 56430 4.83 



152 



PNPH.MOUSE 



1179. 1180. PYVC-RAT' 

1181. 1182. 

1183 

55. 103 SM30.RAT 

135 SODC.RAT 

172 TPM-RAT" 

277. 56 TBAIJUI 

50. 1225 TBB1JUT 

1224 VIMEJUT 



Pro-APO A- 1 lipoprotein 
Protein kinase C inhibitor 

Purine nucleoside 

pbospborylase 
Pyruvate carboxylase 



SMP-30: Senescence 
marker protcin-30 
Superoxide dismutase 

Tm: tropomyosin 

Tubulin c 

Tubulin 0 

Vuoentin 



Microsomal lumen location, p/, M, reUtive 1391.03 5.99 
to albumin 

Coclectropboresis with plasms protein 920.41 5.43 

1 Internal sequence; homology with bovine 1480.01 6.08 
protein 

Internal sequence 1507.19 6.10 

Tentative; 2*D of pure protein (J. G. 1485.10 6.08 
Heaslee, JBC, 1979); reported in Biochim. 
Btophys. Acta 1022. 1 15—125- 

Internal sequence 721.71 5.11 

AAA; comfirmed by internal sequence 116134 5.74 . 

OL M. Van Frank) 

Location in cytoskeleton. 2-D position 47634 4.66 

relative to human, Ab 

Positional homolo0 with human through 68832 5.06 

coclectropboresis, cyioskeletai location 

Positional homology with human through 62139 4.93 

coeJectropboresii, cytoskeleul location 

Poshonal homology with human through 673.00 5.03 

coclectropboresis, cytoskeleul location 



181.64 160 640 



499.64 

1062.67 



58 968 

25 504 



137035 18 493 



569.09 
60533 
26337 
62334 
52331 
588.83 
1184.65 
737.77 

566.92 
86135 

915.71 

53833 

1019.42 

425.76 

520.03 

43734 
329 

1006.04 



51 448 

48 187 
112 194 

46 674 
56.169 

49 620 
22 310 
38 858 

5) 655 
32 638 

30 423 

54 571 

26 811 

69 521 

56 561 

67 674 
90 107 

27 237 



425.19 69 615 



697.62 
1483.43 

861.96 

413.67 

39331 
528.47 



41 327 
16 622 

32 620 

71 589 

75 366 
55 618 



446.68 66 195 



113731 
1458.81 

911.16 

22332 



23 467 
17 007 

30 599 

131 589 



830.10 34 051 

1318.6! 18 173 

957.86 28 865 

537.67 54 620 

535.48 54 855 

S3930 54 426 
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2-D gel pattern for B6C3F1 mouse liver, standardized using the F3*4 rat liver pattern identifications, according to the method 
t text. T»eniy-mne proteins are identified. 



P^RATflASMA ~ AaTLIVXH x-,1 (ZlUTfLASMA.Lrv-Ell X -RAT LIVER X 

(/RAT PLASMA X-RATPLASMA..LrvtR X (^RjkT PLASMa))) 

(8) 

This unified approach, in which one well-populated 2-D 
pattern is used to standardize a family of other patterns, 
has the additional advantage that the resulting p/ and M t 
scales are directly compatible. Hence one can compare' 
the relative pfs of mouse and rat versions of a se- 
quenced protein in a consistent p/ measurement system, 
and select likely inter-species analogs based on posi- 
tional relationships on common scales. Adoption of 
immobilized pH gradient (IPG) technology [4-7] will 
result in substantial improvements in p/ positional 
reproducibility for standard 2-D maps such as those pre- 
sented here; however, we believe that our approach will 
continue to be useful in establishing the empirical pH 
gradient actually achieved by such gels under given 
experimental conditions (temperature, urea concentra- 
tion, ere), in relating patterns run on different IPG 
ranges and using different lots of IPG gels (between 
which some variation will persist). Development of 
rodent organ maps is a continuing effort in our laborato- 
ries [8-10], and results in regular additions of identified 
proteins. Those who wish to receive current rodent liver 
maps, with color annotations, should send a stamped 
self-addressed envelope to the first author. 



We would like to thank the individuals who provided anti- 
bodies mentioned in Table J, and R. Af. van Frank for un- 
published sequenced data. 
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identify all cDNA species, and the approach does 001 easily allow a systematic 
screening. Analysis of gene expression by the study of proteins present in j cell or 
tissue presents a favorable alternative. This can be achieved by use of two-dimensional 
(2-D) gel electrophoresis, quantitative computer image analysis, and protein identifi- 
cation techniques to create 'reference maps' of all detectable proteins. Such reference 
maps establish patterns of normal and abnormal gene expression in the organism, and 
allow the examination of some post-translational protein modifications which are 
functionally important for many proteins. It is possible to screen proteins «vstemati- 
cally from reference maps to establish their identities. 

To define protein-based gene expression analysis, the concept of the •proteome' 
was recently proposed (Wilkinse/a/.. 1995: Wasingere/*//.. 1995). A proteome is the 
entire PROTein complement expressed by a genOME. or by a cell or tissue type. The 
concept of the proteome has some differences from that of the genome, as while there 
is only one definitive genome of an organism, the proteome is an entity which can 
change under different conditions, and can be dissimilar in different tissues of a sinsle 
organism. A proteome nevertheless remains a direct product of a genome. Interest- 
ingly, the number of proteins in a proteome can exceed the numberof genes present, 
as protein products expressed by alternative gene splicing or with different post- 
translational modifications are observed as separate molecules on a 2-D cel. As an 
extrapolation of the concept of the "genome project*, a •proteome project" is research 
which seeks to identify and characterise the proteins present in a cell or tissue and 
define their patterns of expression. 

Proteome projects present challenges of a similar magnitude to that of senome 
projects. Technically, the 2-D gel electrophoresis must be reproducible andof hich 
resolution, allowing the separation and detection of the thousands of proteins in a cell. 
Low copy number proteins should be detectable. There should be computer eel imasc 
analysis systems that can qualitatively and quantitatively catalog the electrophoreticaliv 
separated proteins, to form reference maps. A range of rapid and reliable techniques 
must be available for the identification and characterisation of proteins. As a conse- * 
quence of a proteome project, protein databases must be assembled that contain 
reference information about proteins: such databases must be linked to ccnomic 
databases and protein reference maps. Databases should be widely accessible and easv 
to use. 

Recently, there have been many changes in the techniques and resources available 
for the analysis of proieomes. It is the aim of this chapter to discuss the status of the 
areas outlined above, and to review briefly the progress of some current proteome 
projects. 

Two-dimensional electrophoresis of proteomes 

Two dimensional ( 2-D) gel electrophoresis involves the separation of proteins by their 
isoelectric point in the first dimension, then separation according to molecular weicht 
by sodium dodecyl sulfate electrophoresis in the second dimension. Since first 
described (Klose. 1975: 0'Farrell. 1975:Scheele. 1975). it has become the method of 
choice for the separation of complex mixtures of proteins, albeit with many modifica- 
tions to the original techniques. 2-D electrophoresis forms the basis of proteome 
projects through separating proteins by their size and charge (Hochstrasser ei at.. 
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:-D GEL RESOLUTION AND REPRODUCIBILITY 

Xt^ Ch K ,Cnge ° f t CPar3ting C ° mp,eX mixlures of P™™ bv 2-D gel electro- 
phorcs,s has been to achieve high resolution and reproducibility' Hi»h re'oluT™ 
ensures that a maximum of protein species are separated, and hig reproduliCi 
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vital u allow comparison of gels from day i 0 day and het-c?n res.-arch sites Thes-- 
factor* can be difficult to achieve. 

C; rner ampholytes are a common mean* of isoelectric focusine for the ft™ 
dimension of 2-D electrophoresis. Gels are usually focused to equilibrium to separate 
protein- in the pi range «J to 8. and run in a non -equilibrium mode ( N'EPHGE i to 
sepaiate proteins of higher pi (7 to 1 1.5i(0Tarrell. 1975: OTanell. Goodman and 
O'Farrell. 1977i. Unfortunately, the use of carrier ampholvies in the iso-lectri- 
focusing procedure is susceptible to cathode drift*, whereby pH srad.ents established 
h> p efocusing of ampholytes slowly change with time (Righetti and Drysdale 1<Tm 
Carr.er ampholyic P H gradients are also distorted by high >a.t concentration of 
samr :es < Bjellqvist e t al.. 1 982). and by high protein load (OTam..'!. 1 975 1. a further 
limitation is that iso electric focusing gels, which are cast and subject to electrophore- 
sis in narrow glass tubes, need to be extruded by mechanical mean> before application 
to the second dimension - a procedure that potentially distorts the gel. Nevertheless 
many of the above shortcomings can be avoided by loading small amounts of U C or U S 
radiolabeled samples (Garrels. 1989: Neidhardt et at.. 1989: Vandekcrkhove a at 
1990). High sensitivity detection is then achieved through use of fluoroTaphy or 
phosphonmaging plates (Bonner and Laskey. 1974: Johnston. Pickett and Barker 
1990: Patterson and Latter. 1993). However, this approach is only practicable for 
organisms or tissues that can be radiolabeled. 

An alternative technique, which is becoming the method of choice for the first 
dimension separation of proteins, involves isoelectric focusine in immobilized pH 
gradient ( IPG i gels ( Bjellqvist et ai. 1982: Gorg. Postel and Gumher. 1988: Righetti 
1990). immobilized pH gradients are formed by the covalent couplins of the pH 
gradient into an acrylamide matrix, creating a gradient that is completely stable with 
time. IPG gels are usually poured onto a stiff backing film, which is mechanically 
strong and provides easy gel handling (Ostergren. Eriksson and Biellqvist. 1988 ). The 
major advantages of IPG separations are that they do not suffer from cathodic drift • 
they allou focusing of basic and very acidic proteins to equilibrium. pH cradicnts can 
he precisely tailored (linear, siepwise. sigmoidal). and that separations over a very 
narrow pH range arc possible (0.05 pH units per cmnRichctti. 1 990: Bjellqvisi ,i «,/ 
1982. 1993a: Smha ct ah. 1990: Gbrg ct at.. 1988: Gclfi ct al.. 1987: Gi.nther <•/ ,/ 
1988,. However, n is not currently possible to use IPG eels )0 separate \ery basic 
proieins of isoelectric poini greater than 10. although this iv under development 
Narrow pH range separations are useful to address problems of protein cn-mi»ration 
in complex samples, allowing zooming in' on regions of a gel i Figure 2). IPG sel 
sinps are now commercially available, which begin to address the problems nf inl- 
and inter-lab isoelectric focusing reproducibility. 

There are two means of electrophoresis for the second dimension separation of 
proteins: vertical slab gels and horizontal ultrathin gels (Gore. Postel. and Gunthcr 
1988). Both are usually SDS-containing gradient gels of approximateK I \<7 f l0 15 «7, 
acrylamide. w hich separate proteins in the molecular mass ranee of 10 - l50kD A 
stacking gel is not usually used with slab gels, but is nccessarv w : hen usino horizontal 
gel setups ,Gorg. Postel and Gunther. 1988). Comparisons have shown thai there is 
little or no difference in the reproducibility of electrophoresis usine either approach 
(Corbet, et al.. 1994a). but commercially available vertical or horizontal precast o e l s 
will prov.de greater reproducibility for occasional users. For slab eel electrophoresis 
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Notwithstanding the advances described afcrve. there is jn increasinc demand to 
improve the reproducibility of 2-D electrophoresis to fccili-aie database construction 
and proteome studies. Harrington ei al. (1993) explain thai if a eel resolve* 4000 
protein spots, and there is 99.55- snot matching from eel to eel. thi"s wiU produce 20 
spot error* per gel. This amount of error, which might accumulate with each eel to eel 
comparison used in database construction, could produce cn unacceptable decreed 
uncertainty in gel databases. To address these issues, panid automation of laree 2-D 
gel reparations has been undertaken (Nokihara. Moritaand Kuriki. 1992: HarrWton 
et til.. 1 993 ). Although results are preliminary, spot to spot positional reproducibilii\ 
in one study was found to be threefold improved over manual methods < Hamilton et 
a!.. 1993). It should be noted that small 2-D gel formats <.50 x 43 mm) have been 
almost completely automated (Brewer et al.. 1986). although these are not Generally 
used for database studies. 



MICROPREPaRaTIVE :-d gel electrophoresis 

With the advent of affordable protein microcharacterisation techniques, includin- N- 
terminal microsequencine. amino acid analysis, peptide mass fingerpriminc. phosphate 
analysis and monosaccharide compositional analysis, a new challenge for 2-D electro- 
phoresis has been to maintain high resolution and reproducibility but'io provide* 
protein in sufficient quantities for chemical analysis (high nanogram to low microgram 
quantities of proteins per spot). This becomes difficult to achieve with very complex 
samples such as whole bacterial cells, as the initial protein load is divided amonc •'000 
to 4000 protein species. Two approaches are used for producing amounts of material 
that can be chemically characterised. The first method is to run'multiple eels, collect 
and pool the spots of interest, and subject them to concentration ( Ji ct al 1994- Walsh 
ci al.. 1 995: Rasmu ssen et al.. 1 992 ). In this approach, the concentration process must 
also act as a purification step to remove accumulated electrophorctic contaminants 
such as glycine. A more elegant approach has been to exploit the hich loadin- capacity 
of IPG isoelectric focusing. The high loading capacity of immobilised pH~<*radiems 
was described early (Ek. Bjellqvist and Righetti. 1983). but has onlv recently been 
applied to 2-D electrophoresis (Hanashp/o/.. 1991 : Bjellqvist ctal.. I993hl Unto 15 
mg of protein can been applied to a single sel. yielding microgram quantities of hun- 
dreds of protein species. A further benefit of this approach islhai proteins present in 
low abundance, which may not be visualised by lower protein loads, arc more likclv 
to be detected. The use of electrophoretic or chromatoeraph.c prefractionation tech- 
niques (Hochstrasser <•/<//.. 1991a: Harrington eial.. 1992). followed bv hi<*h loading 
of narrow-range IPG separations (Bjellqviste; al.. 1 993b i provides a likelv solution to 
studies on proteins present in low abundance. 

Methods of protein detection 

There are many means for detecting proteins from 2-D gels. The method used will be 
dictated by factors including protein load on gel tanalvtical or preparative) the 
purpose of the gel ( for protein quantitation or for blouing and chemical characicrisa- 
tion ,. and the sensitivity required. The most common means of protein detection and 
their appl.cat.ons are shown in Table J. Most detection methods have drawbacks for 
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Table l:C<~^< m stai.is for Z-D eels or hlots and iheir application*. 
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PVDF or NC 
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Gels 
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Gel staining, not 
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protein to protein 
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JO ng protein Sirupat rt «/.. ls>SM; 

on band or Gharahdaglu rt aL. 

spot of gel IW2: 

Gi»ldhcrg rt aL. IMKK: 
Sanchez rt aL. ivu^ 



H\ * higher Vamaguchi and 
Asakawa. 1988: 

coomassie Eckcrskorn ct aL. 
IV92: 

Strupat rtaL. I»*i4 
Higher than (>ni/ rt aL. 
cimmasMc James rt aL. 



,f)0 n ? Sanchez rt aL. iwq^: 

protein on Sirup;n rt aL. |hwj. 
band or vpot Wilkin* rt aL. mg.V 
of gel 

I- Hlng Li i t aL. NSY. 

Hughe*. Mack anJ 
Hampanan. I^SK. 
Sirupai rt al.. |wvu 

MX) ng protein Camphell. 
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^poi ni gel Jorgcn^en. 

Goldberg rt al.. |*jxk 
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example, some glycoproteins are not stained by coomassie blue (Goldbcrc ct aL. 

1 938 ). and many organic dyes are unsuitable for protein detection on PVDF if samples 
are to be used for direct matrix-assiied laser desorption ionisaiion mass spectrometry 
(Strupat ct aL. J 994). 3 

Although most means of protein detection give some indication of the quantities of 
protein present, in general they cannot be used for global quantitation. This is because 
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no proieii. stain is able con.-.istently to detect proteins over a range of concentre 
lions. isoelectric points and amino acjd compositions, and with a variet\ of 
posi-translational modification* < Goldberg <v n/.. 1988; I, rial.. J989i. Furthermore, 
there are large differences in staining pattern when identical eels or blo.s are subjected 
in differmt stains, including amido black, imidazole zinc, india ink. ponceau S. 
colloidal gold, or coomassie blue iTovey. Ford and Baldo. 19K7: Ortiz a al.. 1992) 
The ir.u .i common means of quantitating large number*, of protein, in a 2-D "s\ 
involve^ the radiolabclling of protein samples" prior to electrophoresis, andprotem 
quanrtajon based on fluorography and image analysis or liquid scintillation countin- 
(Garrels. 1989: Celis and Olsen. 19941. However, proteins which do not contain 
meihion..ie cannot be detected if only p'sj methionine is used for la^ellinc. Amino 
acid analysis of protein spots visualised by other techniques presents a likely means of 
protein quantitation for the future. 



BLOTTING OF PROTEINS TO MEMBRANES 



Electrophoretic blotting of proteins from two-dimensional polyacn lamide eels to 
membranes present s many options for protein identification and microcharacierTsation 
which ar; not possible when proteins remain in gels. For example, when proteins are 
blotted to polyvinylidene difluoride < PVDF > membranes, they can be identified bv N- 
terminal sequencing, amino acid analysis, or immunoblotiing. or they may be subjected 
to endoproteinase digestion, monosaccharide analysis, phosphate analysis, or direct 
matrix-assisted laser desorption ionisation mass spectrometry (Matsudaira. 1987- 
Wilkinsmj/.. 1995: Jungblutr/a/.. 1994: Sutton ettil.. 1995: Rasmussen ci at 1994* 
Weizthandler ci al.. 1993: Murthy and Iqbal. 1991: Eckerskorn et al.. 1992 ). It j s ' 
possible to combine of some of these procedures on a single protein spot on a PVDF 
membrane f Packer ci al.. 1 995: Wilkins ci al.. submitied'wcizthandler ci al.. 1 993 j. . 
This is useful when minimal amounts of protein are available for analysis. These 
lechniques will he explored in detail later in this review. Notwithstanding the above 
(here are some disadvantages associated with blotting of proteins to membranes 
There is always loss of sample during blotting procedure's ( Eckerskorn and Lottspcich 
1993). and common protein detection methods are less sensitive or not applicable lo 
membranes iTahh 1 1. presenting difficulties for the analvsis of low abundance 
proteins Detailed d.scuss.on of the merits of available membranes and common 
blotting techniques can he found'elsewhere (Eckerskorn and Lotispcich 199 V Strupu 
a al.. 1994: Patterson. 1994). " '* 



2-D gel analysis, documentation, and proteome databases 

Following protein electrophoresis and detection, detailed analvsis of <'el hn-.oes is 
undertaken with computer systems. For proteome projects, the aim of this analysis is 
to catalogue all spots from the 2-D gel in a qualitative and if possible quantitative 
manner, so as to define the number of proteins present and their levels of expression 
Reference gel images, constructed from one or more gels, form the basis of two- 
d.mens.onal gel databases. These databases also contain protein spot identities and 
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4-iaiU of their po ,-transiational modifications. :. D * e J database k ■ 
linked to or mteerared with comprehensive nr^, 7 ? be * C,nn,nr 10 ^ 
« oL ,989: Simpson J J^i ~ ^ ^ 

databases, containing DN A sequence data rhr ftm ;f ' Jnd or ? a ™nV 

D g eU and pro,=,„ fjneona, ^^^^^^ * 
J. *enome and pro,eome pro,.™ pro,™ .viSSE^ ^V??"" - 
Database cited in Garreis e: ul.. 1994|. — ' ea<t p ™e:n 



GEL IMAGE ANALYSIS AND REFERENCE GELS 

After 2-D electrophoresis and protein visualisnrinn k. ,. • • 

pho.Vhorim«i M . ,n>a« 5 of .eh arc di.M«d " ? ' """'"f^ »' 

scanner. User" denser, or rt-^^^ScD?,- "*V* 
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pulsion, ,o rcmov t venica, and honzomal ^.rjSS^'T"' ^ 
«po, posi.ions and boundaries, and ,o calculate ,poH„,e„^^> '° " CleCI 

<poi ISSP. number, comamin* venical and ? K ' * r,w • 1 s,andar<1 
a,<i,ned ,o each de.eced S po, SkiES^ZES ? ""° mM '™- » 
K« »* noiable <of« .repackage* pte" V^s" ""'^ ' 



Table 2: Some Software Packages for the Analyse of Gel Imaees. 
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CALCULATION OF PROTEIN ISCZLSCTRI. - POINT AND MOLECULAR U£ ICHT 

Elation of the .soelectric pom, ,p|: and molecular weight ,MW, 0 f protein, from 
-D gel. p:ov,de, fundamental parameters for each prote.n. which are also ol Te 
during .dent.ficai.on procedure, (see f jllovv.ng 5eclion , The p , and Mvv f 
are recorded in 2-D gel databases. Accurate estimations of protein P I and MW can he 
obtained by us.ne 20 or more known pr oteins 0 n a reference map ,o construe, standard 
curves of pi and molecular u -eight, u hich are then u^d to calculate estimated n! and 
MU of unknown pro,eins (Neidhard, et aL. I9S9: Carrels and Franza I9S9 V-.n 
Bogelen. Hutton and Ne.dhurdt. 19%; Anderson and Anderson. 1 99 1 • Anderson i 
ul 1 99 1 : Latham „ aL 1 992 , A.ternmive.y. the MW of individua. ^^Z * 
to P\ DF can be determ.ned very accurately b> direct mass spectrometry (Eckcrskorn 
et aL 1992,. Where immobilized pH gradients are used, the focusin- pos.tU of 
proteins allow, their pi to be measured within 0.15 units of that calculated from the 
am.no ac.d sequence ( Bjellqv.s, „ 1 993c ,. It mus, be noted, however, that protein, 
carrying pos.-translat.onal modifications may migrate to unexpected pi or MW 
positions during electrophoresis (Packer e/«/.. 1995). 

.SPOT QUANTITATION AND EXPRESSION ANALYSIS 

A major challenge faced in proteome projects » the quantitative anahsis of protein, 
separated by 2-D electrophoresis. The moM accurate means of prote.n quantisation", 
to determine chemically the amount of each protein present bv amino acid com- 
positional analysis. However, the current method of choice for quantitative analvsis 
of many proteins ,s to radiolabel samples with ["S] methionine or "C amino adds 
perform the _-D electrophoresis, and measure protein levels in disintegrations per 
minute <dpm> or units of optical density. Quantitation is achieved either hv liuuid 
.cm, la..on counting, or b> gel image analysis where spot densities are quantised 
n> reference .o eel calibration s.nps containing known amounts of radiolabeled 
protein or against the mtegniied optical density of all spois visualised « Vandekcrkhove 
ct «/.. 1990: Celts et aL 1990b: Celis and Olsen. 1994: Carrels 1989 Lai 
Carrels and So.ter. 1 993: Fey „ „/.. ,994,. All approaches effective** 
ne normalised aga.ns. the total disintegration* per niinute loaded onto ,„- n C , 
Limitations that remain with radiolahelling methods are that absolute quantitation" ,s 
not achieved because all proteins have varying amounts of any ammo acid, and thai 
only easily labelled samples can be investigated. Quant.tam e silver staminc nresem 
an alternative .Giomem ct aL 1991: Harrington a aL 1992. Rodricuez da/ | W t 
lyric. „/ ,993,. which u-hen undertaken w„h PSjth.ourca .Wallace and Sa.uz' 
i j.hi is of extremely high sensitivity. 

When protein spots from samples prepa'red under different conditions are quantised 
and matched from gel to gel. t, becomes possible to examine chan.es and patten n 
protein expression. Large scale investigation of up- and down-regulation of protein' 

^ f o^dT ""'"P*™* =• Can * Undtf - ken " F " «™nple. s.miannrus 40 
cu ° T ^ keral,noc >^ « ere "houi, to have 1 77 up-reguiated and 58 down- 
regulated proteins compared to normal keratinocytes (Celis and Olsen 1 994 , detai i«i 
Ynthesis profiles of 1 200 proteins have been established in I M cln^le^l 
• Latham,,,,,. 1991. 1992,: and 4 proteins out of 197, w™LS*^t?£ 
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cadmium io\iciiy in unnan proteins (Myrick etaL 1993). Complex dot al chaii-e* 
in proif m e.\pre«ion as a resuli of gene disruptions have also been invewicaied iS Fev 
ant! P. Mov -Lar>en. Personal communication). Impressively, larce eel Tets shouinn 
protein expression under different conditions can be globalh mvesti^ated u«n* 
vtat siical n cihods that find groups of related objects within a set. For example the 
REFf 2 rat c.-ll line database, consisting of 79 eels from 12 experimental e.oup< u here 
each eel contains quanutative data for 1 600 crossmatched protein*. ha« Keen analysed 
hy cluster analyse (Carrels ei aL 1990). This revealed clusters of proteins thai for 
example, vere induced or repressed similarly under simian virus 40 o. adenovirus 
transformu'ion. suggesting a common mechanism. Protein croups thai were induced 
or repressed during culture growth to confluence were also found. It is 0 rn ious that the 
potential for investigation of cellular control mechanisms bv these approaches is 
immense, h is equally clear that investigations of gene expression of this scale are 
currently technically impossible using nucleic-acid based techniques. 



Table 3: Some prntcnme database* and ihcir special features 



Pnnctimc database 



Special features 



References 



erne -protein cj:nnncc 



Human heart database* 



Hum. in kcraiinocvie database 



Mi'u^i- ;*mhr\n datanjs 



Mnuvf it \ cr datahase 
< Aa-onne Protein 
Mapping Group » 

R.n ii\cr epithelial database 
Rj: liver database 



REF 52 rji cell line datahase 



5\vjss :DPAGE contam.ni: 
human rcicrcncc maps 



Vc;i«i Protein Database fYPDi 
and Yeast Elccirnphorcne 
Protein Database (YEPDi 



Gei v;poi- linked with GcnBank 
and Kohara clones; quantitative 
spot measurements under differ- 
ent crouth conditions 

identification ui disease markers: 
tup \cparnic databases have 
been established 

Extensive identifications; 
quanutative spot measurements 
of t ran stormed cells: identifies- 
iion ol disease markers 

Ouanuiaiive spot 
measurements through 
I to J cell stjye 

Documents chances due to 
c\pt»surc in loni/tne radiation 
and to\u chemicals 

Detailed subcellular 
i ractionation studies 

Extensive studies on regulation 
o! protein^ b\ drups and unit 
aecnis 

Accessible via World Wide Web 
quantitative spot measurements 
under difiercnt conditions 

Accessible via World Wide W eh. 
complcicK intecrated with 
SWIS.S-PROT and 
SWJSS-3DIMAGE 

CompicieK crossrclerenccd 
orcanism database; VPD has 
extensive inlormation on over 
J5<>0 proteins: YEPD has 
man\ identifications 



VanBoirclcn and Nciilitardi. I«9I; 
VanBoecien etui.. 1992 



Baker era!.. 1992 
Corhett rati., I9wab 
JunsrMut naL 1994 

Cchs ctat.. |99t)a 
Cell* ,-/«/.. 1993 
Cells and OUcn I99j 

Latham a at.. 1 99 1 
Latham *•/ ///.. |uu2 

Ginns-m. Ta\lor and TulLik^cn. 1992 



Wirthr/,//.. 1991 W.rth ct «! . |wu; 

Anderson and Anderson. |W9|. 
Andcrsnn a at.. J 992; 
Richard* m. Horn and Anderson. |9wj 
GarreU and Fran/a I9K9 
Bout^II cr at.. 199J 

Appcl rt at.. 1993 
Hocrutrasser n at.. 1992 
Huclies r; uL. 1993 
Gola/ ctat,. 1993 
GarreU rial.. 1994 



FEATURES OF PROTEOME DATABASES 



Proteome projects rely htanly on computer database* to store information about all 
proteins expressed by an organism. 'Proteome databases' should contain detailed 
information of protetn> already characterised elsewhere. a* u d j a< protein data from 
2-D gels such as apparent pi and MW. expression level under different conditions, 
subcellular localisation, anc 1 information on posMranslational modifications lm;\*c* 
of reference 2-D gels. sScwing protein SSP numbers and protein identifications 
should also be included Ideally, proteome databases should be accessible uuh 
Macintosh or IBM persona computers and easy 10 use. Some proteome datahaxev and 
the areas they cover are l^ted in Table 3. Databases range from collection, of 
annotated geU to large daia'-wes of images integrated with protein and nucleic acid 
sequence banks. 

One example of an integrated proteome database is the suite of SWISS PROT 
S WISS-2DPAGE and S WISS-3DIMAGE databases ( Appel a aL J 993; Appel a ol * 
1994; Appel. Bairoch and Hochstrasser. 1994; Bairoch and Boeckmann. 19Q4i. The" 
features of these three databases are listed in Tabic V. SWISS-PROT. SWISS- 
2DPAGE and SWISS-3DIMAGE are accessible through the World Wide Web 



Table 4: The SWISS-PROT SWISS-IDPAGE and SWISS- JDIMAGE mi.ic ,.i cr^loAcd Uauh-wc. 
All three datanases art accessible through ihe World Wide Web. ai URL addros: hup:// 
cxpas\ .hcuge.ch/ 



SWISS-PROT 



SWISSODPACE 



SWJSS-3DIMAGE 



Information 



Text entries of sequence daia: 
Citation information: 
tax on omit data. *03 
entries in Release 2v 



Anntitaiionv 



Pmtein function. 
Posi translations I 
m(»Ui fuat ions. 
Domains: 

Secondary structure. 
Quaternary structure. 
Disease** avso^uicd 
w ah proicin. 
Sequence conflict* 

SWISSZDPACE 
SWISS- JDIMAGE 
EMBL. PIR. PDB. 
OMIM. PROSITE. 
Medline: Flyha.se: 
GCRDh. MaizeDB. 
Worm Pep. Diets DB 
Other Features Navigation to other 

SWISS databases achieved 
h> selcctinc entries with 
computer mouse 



Cms*. 

Referenced 

Databases 



2-D eel imacev of: human 
liver, plasma. HepC2. HcpG2 
secreted proteins, red blood cell, 
lymphoma, cerebrospinal lluid. 
macrophage like cell line, 
emhrolcukemu cell, platelet 
Gel imaeev where 
protein ix lound: 
Hou protein identified. 
Protein pi and MW. 
protein number: 
normal and pathological 
variant* 



SWISS-PROT and all 
other databases 
acce>Mble throuch 
SWISS-PROT " 



Gel imaces shou. position 
of identified proteins, or 
region of ccl where protein 
should appear 



Collection of *M) >.D 
images ol protein* 



All anm nation iv 
available in SWISS- 
PROT 



SWISS-PROT and all 
other datable* 
accc^viblc ihrouch 
SWISS-PROT * 



Mono and stereo 
■mace* available. 
1 maces can be 
transferred to local 
computer image 
viewing programs 
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(Bener* 
the stored 



•Lee et oL 1992 k allowing any computer connected to the inien*. 
I tnformouon and tmaees. N 3 v igaiI0n u , tnir anJ J^^^ 
,s seamless, a, all potential crosslink, are hisrhlirhted a, hvp-n-xt on h < > 
car be selected u-.th a computer mouse. From these 

abc u , a protein, including ammo acid sequence and known postradiational Todifi- 
cat.ons. can be obtained the prec.se protein spot i, correspond., to on a refill 
,m; ge can be v.ewed .f knou n and the 3-D structure of the molecule can 'be* ^ 
.-.variable. Reference, to nucle.c acid and other databases are also civen xo^J 
access to information stored elsewhere. ' prcn ,dc 

Organism" databases, containing detailed protein and nucl,ic acid information 

Thee differ from nucle.c acd or pro.e.n sequence databases like GenBank or SWISS 
PROT because they are , mage based, and contain information about chromo Jmal 
map posmons. transcnpt.on of genes, and protein expression patterns ™ £ 
rtencha col, ^-protein database (VanBogelen. Hutton and Neidhard 1990- 
VanBogelen and Neidhardt. 1991. VanBoeelen et al 19<r, IT 
EC02DBASE. is one examp.e. It contains 

mfonnat.on (including pi and MW estimates, and spot identification) *en m? Lr 
mation tGenBank or EMBL codes, chromosomal location. ^ 
-kohara. Akiyama and hone. 1987,. transcription direction of cenes). and protem 
regulatory information tleve, of protem expression under different aLTrlZls 
member of regulon or sumulon). All entries ,n the EC02DBASE~are also cross 
referenced to the SWISS-PROT database tBairoch and Boeckmann ,994, 1 s" 
anticipated that organism databases will soon become a standard means of s.or in' a 
mailable information about a particular species. However there is currenth^o 
consent manner in which organism databases are assembled, which mav haLcr 
comparisons .n the future. • nam P cr 

Identification and characterisation of proteins from 2-D gels 

The number of proteins identified on a 2-D reference map determines „ s usefulness U s 
a research and reference tool. As most reference maps have onlv a small prop r i n « 

roni _-D maps. ,n order ,o define then, as "known" in curren, nucleic acd and P r , cm 

ct n rdn^r 7' Pr0,C ' n idCn,ifiCa,i0n «" c-onr.rmat.on oFdn" 

open reading frames, and provides focus for DNA sequencinc project* and p^in 

^ KK^nm l, ° n T h> P0 ' n,,n? ,0 Pr ° ,C,nS ,ha ™ S ">" there m v b 

3 KKU40Q0 pro,e,ns from a single 2-D map that require identification, the challenge in 
Protem screening , ,o .dentify proteins quickly. u„h a m,n,mum of cos, and cf?c n 
Traditionally, proteins from 2-D gels have been identified bv technique such "s 
.mmunoblomng. N-terminal microsequencng. internal peptide 1 ,end„. 
com.gmt.on of unknown prote.ns with known proteins. or S *f 

c " ^ Wr!.sf h ' ^ l992:C < 1 ™*'- • 993: Honore ,/„/.. 1993: Carrel" 



l able •: Hierarchical analvtu for ma$« wenm* «r * * 

Rap.d and ,n«-rr..,» e .e: « arc uicd a - a £* t ^ Pm,cm» hi,,,,* ... 



Order Jdentifirannn ir;nniouf 



g 
9 



1 Amino acid ana vmi. 



Ami*, acid 3'uI..m< u .ih N-iermmal sequence up 
Pepnde-ma^ tirecrpriminc 



Combination of amino acid anaKsi* and peptide 

mas* fingerprinting 

Ma«. spectrometry sequence ta£ 

Extensive N-icrnimal Edman microiequenonp 

Internal peptide Edman mirnKcquencing 

Microveouenang K ma." spei-iromeir% ielcc.ro. 
spra> ionization. poM-source dezav MALDI-TOF) 
Ladder scuuen^me 



im&luxttaL NV:. S luu. |gg; 
Hohnhm. H.milueu- anj Sender "|ol- 
Ju nF hluiri w A. l-i«.U-,ik inw7<< ;._ |tg < 
W.lkms r; SU h millcd 

Mann. Hoirup and Ri*PMorn |u*m 
^aie^rf*,/.. I^.V Mnrt/ , , , gg , 
.Suiion r/ |vv5 

Corducli et at.. |y95. 
^ astneer r/ «/.. | wu5 ; 

Mann and Wilm. 
. Maitudaira. ]yK7 
Rosen fcld a at,. W2: 
Hellman r/«/.. 1995. 
Johnson and WjUh. ivy^ 

Bartlci-Jnnct rt ///.. |uwj 



alternatives traditional approaches lTubh5:\V^l nseretu i , 99Sl T . • . , 
use of rapid and cheap identification tool, such n^^^T™ 1 * 
mass fingerprinting as fi r „ Slcps in pro|ein U^^^^i^ ^ 
dower, more expensive and time consuming identincaiion L? , r ' ,h ° USe of 
ihe construction of this hierarchv the analvsfe TZ 1 L" pr0CedurCs ''necessary. In 
of the data created has heen «^ « ' ^ ^IS'^™*** 
machine „me per sampJe. the analvsis of data can T„ ^ ' iU,C? 
con,umm S . Am.no acdanalvsi, and p eptide ma fi " m qU " e :md U ™ 

•echnique, in the hierarchy are discJTed "J ^rS^" 5 ^r^^ 
■den-incation technics ,n Tuhlc 5. see 

PROTEIN IDENTIFICATION BV AMINO ACID COMPOSITION 

identify ,, hv comp^rui hTl " 'VZ° aC Ct "" P<K " i0n " rnfilt ta 
The amino acd comL ion of nrmZ - "TP*"™- ° f r™"- in claiahas^ 

"1.. 1 994: Frev ei al 1 99J i or k. ■ j u I . electrophoresis (GarrcK c -/ 

cnro m a,o ? ,ap„" l^^^Kn^ "jr*™"™ f™- and 
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s?oi ecili-sik 



xsx: six: scr: 5.*? Hi*: c.7 

Cly: 5 4 Thr : 3.6 Ala: €.7 Pro: 7.9 
Tyr: 1.3 >-•: 5.0 V*l : E.O Met; C.3 
lie: 5.9 6.0 Phe: 13.3 Ly«: <.« 

pi es:;na:e: €.59 Range searched: < €.64. 7.14) 
Hv es::^i;e: •€£;: Rangt searched: (13440. 20160} 

Clcs-sr SWISS- ??.!? entries for the species SCCLI matches i>y AA r= s<-* C n- 

Rank Sccre Protein pi mw Description 

1 24 mX.ZCOU 6.S4 1(989 A^ARTA^'cARJUlXOYLTRANSrEJULSE 

* 39 C3AA.ECCLI 6.32 36359 PANTOTHENATE KINASE (EC 2.7 1 33, 

3 40 SET^t::: I 5.06 35713 homoserot o-succinyltrans«rase 

4 42 -Arr.rr-i: 5.52 37812 TRANscaimcNAL activator cak 

5 43 KLYI.Er^: 8.56 19769 HEMOLYSIN C. PLASKID. 

Closest Svnss-RCT entries for ECGLI with P i «d Mv values in specked 



Ranr. Srcre 



Protein 



Hw Description 



1 
2 
3 
4 
5 



24 

IC2 
112 
140 
142 



PYitr.rcoLi 

TJwe.ErCLI 

YAJs.rrc^i 

YFw T S_ECDLI 
YAHA ECCLI 



€.14 

6.73 
6.79 
6.83 
7.06 



16989 

17921 
19028 
14945 
14726 



ASPARTATE CAMLAMOYLTRAHSrrjULffZ 

TRAJ PROTEIN. 

HYPOTHETICAL LIPOPROTEIN YAJC. 
HYPOTHETICAL 14.9 Kt> PROTEIN IN CRPE 
HYPOTHETICAL PROTEIN IN BETT 3 * REGION 



Ftcure 4. Cnmpujcr P r,ni..ui trom ExPASy server where the empirical nm.no acid ou„ P os„m„ 

rvv.c^no^ I 2 T Pr ° lCm ,n,m " : " D map ° f£ ' "'' ucrc nw "h«J all cnirtc .n 

S\\ ISS PROT i.,r£ , «„ The t-nrreci .Jeniifuaunn. aspanaie carhnmm hraiwlcwc. sh.mn in h..U! Lou 
v*e* mJu-aic j ?<.<.d mau-h N»ic him mauhin* uithin a defined pi and M\V rancc ilouer nci of mucin* i 
lw* : jrca.lv iniTcvcd ihc wore diMcrcncc neiuecn the ftra and «ccnnd ranking protein. Tin. soire 
diuerciwc mivcv hiirh omlidcncc in ihc idcnuftcaimn. and it onl> oKcncJ uhere the io P rankin- prnte.n 
the L-i'iTCi-i iJcnitfi.aiKin ( Wdkms r / IV95». fl 

eraphx -based analysis. Proieins hlotied to PVDF membranes can be hydrokscd in I h 
at !55 r C. ammo acids extracted in a single brief step, and each sample automatically 
dcriyaiised and separated by chromatography in under 40 minutes (Wilkins a aL 
1 995: Ou cr aL 1 995 >. In this manner, one operator can routinely analyse 1 00 proteins 
per week on one HPLC unit. This technology lends itself lo'auiomaiion. and it is 
anticipated that instruments with even greater sample throughput will be developed 
When proteins have been prepared by micropreparative 2-D "electrophoresis (Hanash 
a aL 1991: Bieliqvi* « aL 1993b). blotted to a PVDF membrane and stained with 
amido black, any visible protein spot is of sufficient quantity for amino acid analysis 
iCordwell a aL 1995: Wasinger r; a/.. 1995: Wilkins a aL 1995). 

After the ammo acid composition of a protein has been determined, computer 
programs are used to match it against the calculated compositions of proteins i n 
databases (Eckerskorn a aL 1988: Sibbald. Sommcrfeldt and Areos. 1991 Jun-blut 
ft aL 1992: Shaw. 1993: Hobohm. Houthaeve and Sander 1994- Wilkins vt at 
199m. Matching is usually done with only 15 or 16 amino acids, as cysteine and 



Prnyres* u ith />„„«„„.. /..>./ tV /» 



Cc~Dsi t ;rr. 








Asx: 5.* 


Cix: 10 


.8 


Ser: 4.1 


Giy: 12. 2 


Thr : 2 


.8 


Ul: 11.9 


Tit: €.( 


Arc: 3 


.7 


Val : 9.5 


lie: S.l 


Leu: 6 


.2 


Phe: 3.2 




£ . 99 


Range searched 


Hw ti::ri:e: 


45000 


Ran 


?e searched 



His: 2.7 

Pro: 3.2 

Met : C . 6 

Lys : 4.9 

< 5.74, 6.24) 
(36000. S4000) 



:.oie$: jwiss-prst 



Rank 


Sccre 


Prctein 


pi 


Hw 


1 


21 




6.03 


45316 


2 


22 




5.86 


36502 


3 


38 


GABT.ErOl.1 


5.78 


45774 


4 


44 




5.86 


48018 


5 


45 


DH24_rCClI 


£ . 98 


4esei 


6 


46 


ARCr.ECOLI 


5.79 


<37€5 




46 




5.78 


37851 


e 


47 




5.98 


49162 


9 


47 




5.85 


43290 


10 


50 




6.01 


37064 



entries for ECC- with pi and Hw vil.es in 
K- terminal Seq. 



Lfied 



K S 



K 

I 
5 
K 
T 
E 

M N H S 
« L N R 
« S S K 
K E S R 



Fipure 5. A PVDF pr.ucm <poi from an £ ( ,/// 2-D rclcrrn -e mm «... 

»mc sn„,p, .hen m*, K , .„ an,,n„ at ,d 3 „ 3l> The N^aU ^^e^K R wi^ ^ * 
aod cnnp.Kii.on nl the spot. a , well a» eM.maicd pi and M\V uerc inai.-lL . ... „ 5 Jn,m " 

PROT ■„,£. , „/, lhc nbove l,« „f hes, mart*, u* pn^^ CZ^^tZT' Tl" WISS " 

large wore J.lterem:c hci»ce» ihe first and second rank.n- pnue'ms lm'v n? i ,. ^ ' a 

•he pr,.,c„, ,den„f„n 1 „m H.m-cve,. ,he scquentx lac iM L K kfk !ni Cl L ' ncc ,n rvinj: 

tryptophan are destroyed during hydrolysis. asparaginc and glutaminc arc dcamidaicd 
u. «hc,r correspond? ac,ds. and proline is no. quantified in some analysis s V J s 
The computer programs produce a lis, of best matching proteins, which are ranked bv 
a .core , ha, md.ca.es the match quality. Some programs ; „.ow matchm" u> £ 

, "' ■ ' " 5 K '° Pr0,Cm dM > h ™ en,r '^ for «nc species , Ju "hlut' 

« «/ 1992: U ,lkm, « */.. .995,. The use of such restrictions innea ^ lnc " ^ 

maichmg An example of prote.n .dent.f.ca.ion by ammo acid composition is shown 
^ * T ° & f- am,no acid comnosiiion has been used to .dent.lv pnueins from 
reference map, of W /,,,,,,„ mrilifcm,,,. Afy t „„/„,,„„ ,,„„„/,„„, £ L wT 
ranees ccrcv^ac. D.crynueliun, dncnulcum. human sera, human hcan human 
l> mphocyie. and mouse bra.n (Corduell „ „/.. 1995; Wasin-er a al 19V JT W km 
««/.. 1995: Jungblu,,,,,,.. ,991 1994: Carrels „ „/.. ,994^" 

sTo L T P^ D T EN 7 ,F,CAT,0N BV AM,N ° AC,D «»««>SIT.ON AND K- TERMINAL 

itVU tNCt TAG 

When samples from 2-D gels are no. unambiguously idcn.ificd by amino acid 
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c imposition, pi an<I MW. often ihe cnrr*-, ; u 

*~> sequence ,u/conc<p, M^ntdt ? T ^ "* m ~ 
lined Edman degradation and amino ac.d ana v!i 'annro ^ ' » COm ' 

rro.cn. by Edman degradation for 3 ol 4 ™ e i* qUMe,n * °' PV DF-b,o„ed 

«-h.ch the same .ample ,s used for ammo acid ^ 
•amoved from .he pmicm. its comn 0 s„j ori is ™ r " aftfW M "™^ * e 
wnce onl> a . ma „ :i ,„ oum of pro.cTcqu-nc i^f .T' 1 " Fl '" h ™— 
I" Jman degradatton cvcles can be u Jd M od ' ** Kpc " 1 " tf > ldJ 

allow 3 c> de. ,o he completed m I h. theV hv ZJ™™ ^° U,d 

protein, per week on one automated, mu I, ' JL SCreenm * of l ™ or 
-ion. p, and MW of proteins are match M^^f"™?* ™P- 
N-,ermina. sequences of he, matching prote^a^J " ^ 

io confirm the protein identity iFivure *i Thi. . " ecKetJ Ultn ,he sequence lac* 
pro.ems «. N ; erminal)> bJocked U ^ t ^ "Hntque u i„ be less u * fuI , h J n 

Miscenttble 'n the ace.vl. formvj. or pvronJuta m ;/Jo rr aminP aci(K arc 

this may ttse.f prov.de useful in forrL 2 » ^ ^1^^ V?" ^ 
of N-.erminal sequence ia- and ammo ^.h! SeC,Uen " lJ ? ■^nuf.cat.on. A strength 
data generated are ^S^ggr™ ^cation ,s ,h a , 

PROTEIN IDENTIFICATION BY PEPTIDE MASS FINGERPRINTING 

Techniques for the identification of proteins nv nemid* m .. . r 

recently been described (Henzel c, at 199V LT m ' fin ^ nn,in ? 

W no,.. 1993: Mann. Hointpand £c™^^. v W ^ B,eas *- 199 * 

«/.. 1 9«U: Sutton.,,/.. ,995, This invo3l^: ^ T < ''' ,/ " 

-ng res.duc-specfic enzvmes. t e ,! n ^"f^T ^ P-cins " 

■ng of these masses ;i£ainsl theor ' £ 7^^'* ™^ m ** c 

sequence databases A. protein* have * feren w V ""^^ fr " ni Protein 

^ould produce charauen>,ic finoer^nnK m ' n ° ** ^ Ucnt ^ ** peptide* 

d.ges,s arc reported to produce ^S^Sr ^"f"' 
^.e subsequent pept.de mass analvs.s , W " T " C ° mp,i ' 

Monz « „/.. ,994,. The cnzvme'of ct^ ^ LlT' R '^ en « tlL ^ 
mod.fied sequencing grade,, but oiher e^^ C ?" ' % T" J ' ^ «° f 
aKo been used .Papp.n. Ho.rup and \^ " W . V8 P— se.havc 

pepttdes ob.atned.it is desirable for Drotein^nu , J fnax,mise ,he ""mber of 
■o d.gest.on (Nlon2 „ 19 J '°7 n °^ «° i- -duced a,d alkylated prior 

bonds 0 f !ne prol;rin ure brokep •; ^ ■ ^ en -^ «h» -II disuir.de 

amenable to digest.on. Surprisin^h chern^!. J? con '»rmaiions .ha. are more 
hrontide .methionine .peciSc?*^ «S? f ^ mC,h ° dS SUCh " c - v ^«" 
niirophenyl,ulfenvl,.3. m r e,h ^ * broml/o ^ Udd spccif,c '- :,nd 
hcen explored as 'n*JT&£%^^ have no, 

<^rera^ 



P'»l-rm mil, ,.„./,.,.„„. ,„„„.„, 



•<S-ikod™.nd F, e ,co .979 Crimm™,,,,,.. JW0; v„n„ ;ren „„, 
After proteins arc diresied. peptide masses: arm h.™ a I 

n< higher sensitive and ereater tolerant , n r«n, . lALUI ' ^ ffffJ Ne because ol 
. James « „/.. 1 99* Mt nz n a ^t^T^^ ,>0m : ' D 

more. recent mndincat.or^otn p.! ^S:??" 1 Wk ^ 
difficult expended with the ca.irS * "'V" ^ 
N orm and Mann. J9Q-: Venn. Roepstorff and S S " 
ma« spectrometry alio., a small fracuon of a di Co a I u ? ° f 

for analysis, and analyse ,tse,f ,s complete in a few ^nu^ ^ ^ '° he U ^ d 

A major challenge associated with peptide mass fineerprin.in" is data imem™ , 
pnor to computer matching agamst libraries of theoretTeal ,£^^7^™ 
mus, be examined carefully ,o detcrm.ne which peaks represen nen £ P * 
.merest, a. there are often enzyme autodfresUon^u^ °J 
nances present .Henzel a ai. 1993: Monz rf al 1 004 d ! ConUm,n: «'n? sur ** 
Furthermore, if prote.n aHcy.ation and reduction hi no.' hT? 7 ,9yj ' 
pro,in digestion, pept.de sequence coverage m" Je p 0 a£ ^ " » 

^ ^ Presence of P0«*^^^^ 

unraodmcd p;p,idt alone can he verv difficult to t J of ,ht 
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A number of computer program* are available for matching peptide 
databases .reviewed in Cottrell. 1994). Matchin: »> usually undertaken in an imW 
live manner, whereby peaks of mass 500-5000 Da are selected and matched under 
various search parameters including M W of protein, mass accuracy of peptide* and 
number of missed enzyme cleavages allowed < Henzd et al.. 1 993; Monz ct al 1 994 
Rasmussen et al.. 1 994 i. The correct protein identity is the pr otem which has the most 
peptide masses in common with the unknown sample. Identity have been established 
with as feu a < three peptides, but unambiguous idemificatirn is ihoucht to require a 
mass spectrometry map covering most pepiides of the protein iMoriz ct al | w ■ 
Votes et al.. 1993). To date, peptide mass fingerprimirg of proteins has be-n 
undertaken from the human myocardial protein and keratinorvte maps, from an£ «,/, 
:-D gel. and from reference maps of Spimplasma melUunmt and Mwtmlasmu 
yemtalium (Sutton et at.. 1 995: Rasmussen et al.. 1 994: Henzel a al.. 1 99 v Cord well 
et al.. 1995. Wasinger et al.. 1995). although the technique is most powerful when 
used in combination with another protein identification technique (Rasmussen ct nl 
1994: Cordu ell er al.. 1995 ». 



MASS SPECTROMETRY SEQUENCE TAGGING 

An extension of peptide mass fingerprinting has recently been described, called 
peptide sequence tagging (Mann and Wilm. 1994: Mann. 1 995). This us C s tandem 
mass spectrometry (MS/MS) to initially determine the mass of peptides, then subject 
them to fragmentation by collision with a gas. and finally determine the mass of 
fragments. The resulting spectra gives information about a peptide's amino acid 
sequence. The fragmentation masses of peptides can rarel v be used to assign a complete 
sequence, but it usually allows a short "sequence tag' of 2 or 3 amino acids lo be 
determined. This sequence tag and the original peptide mass is matched bv computer 
against a database, providing a likely identity of the peptide and the protein it "came from 
The major drawback for this technique as a mass screeninc tool is the complexity of the 
mass data generated and the high level of expertise required for its interpretation 
Nevertheless, it represents a useful new protein identification method which -really 
increases the power of peptide mass fingerprinting protein identification. " * ' 

Cross-species protein identification 

Protein sequence databases continue to grow at a rapid rate. \et it .s noi widely 
appreciated that close to 907, of all information contained in current protein datah iscs 
comes from onlx 10 species (A. Bairoch. Pers. Comm... Fonuhaieh. this informaiion 
can he used to study proteomes of organisms that arc pourlv defined at the molecular 
level, via 2-D electrophoresis and -cross-species" protein identification (Corclwcll c 
al.. 1 99y \Vasmger ct al.. 1 995 ). This approach allows protems from reference maps 
ot many different species to be identified without the need for the corrcspondm- .-encs 
to be cloned and sequenced. This is particularly true for housekeeping proteins'" such 
as enzymes mvolved in glycolysis. DNA manipulation and protein manufacture 
which .are highly conserved across species boundaries. Proteins that cannot be 
identified across species boundaries can then become the focus of further protein 
characterisation and DNA sequencing efforts. 
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Rapid cross-species identification of pmieins <rorn 2-D reference maps can be 
undenaken with amino acid composition or peptide mass fingerprinting method 
i Figure 6 j. bui these techniques alone ma> not identify proteins unambiguously when 
phylogenetic cros*->pcwies distances are r eat or analysis data is of poor quality ( Vnif c 
et aL. 1993: Shaw. 1993: Cordwell et aL. 1995). However, very high confidence in 
protein identities can be achieved when lists of best-matching protein* generated bv 
both techniques are compared (Cordwell et aL. 1995: Wasinger ct aL ] 995 > 
conect identification is found when the same protein is ranked highly in li<t> of best 
matches generated by both techniques. This method has allowed approximated 120 
proteins from the reference map of the mollicutc Zpiroplasma nwUifentm. represent- 
ing approximately one quarter of the proteomc. to be confidently identified bv 
reference to protein information from other species tS. Cordwell. Personal Communi- 
cation). When cross-species protein identification is to be undenaken. it should be 
noted thai the molecular weight of a protein type across species is usually hichlv 
conserved, but thai protein pi can van by more than 2 units (Cordwell ct aL 1995) 
Accurate molecular weight determination by direct mass spectrometry of proteins 
blotted to PVDF (Eckerskom et aL. 1992) should therefore be a useful additional 
parameter for cross-species protein identification. 

CHARACTERISATION OF POST-TRANSLATIONAL MODIFICATIONS 

Many proteins are modified after translation. Such post-transiational modifications, 
including glycosylation. phosphorylation, and sulfation (see Table 6). are usuallv 
necessary* for protein function or stability. Some abnormal modifications are associ- 
ated with disease (Duthel and Revol. 1993: Ghosh et aL 199?; Yamashita ct aL. 
1993). In proteome studies, post-translational modifications can be examined on all 
proteins present, or on individual spots. Studies on all proteins provide an indication 
of which proteins may earn a cenain type of modification. For example. 2-D l'c! 
analysis of cell culture^ grown in the presence of ['H] mannose or ["PI phosphate 
give v an indication of which proteins carry glycan^ containing mannose. and which 
proteins are phosphor} iated (Garrels and Franza. 1989). Lectin binding studies of 2-D 
gels blotted to PVDF or nitrocellulose provide information on the saccharides, if anv. 
thai are earned by proteins present (Gravel ct aL. 1994). 

When individual proteins of interest carrying post-translationa! modifications have 
been found, micropreparative 2-D electrophoresis can he used to purifv them in 
microgram quantities (Hanash ct aL. 1991: Bjellqvist a aL. 1993b). If protein 
informs of similar MW and pi are to be studied, focusing with narrow ranee pi 
gradients (I pH unit) can provide greater separation and resolution. After electro- 
phoresis, the type and degree of protein phosphory lation can he investigated iMunhv 
and Iqbal. 1991: Gold et aL. 1994). monosaccharide composition can be determined 
■ Weitzhandler C t aL. 1993: Packer et aL. 1995). and the structure and exact site of 
glycoamino acids can be investigated b> either Edman degradation based techniques 
or by mass spectrometry (Pisano c/ <//.. 1993: Hubert} et aL. 1993: Carr. Huddleston 
and Bean. 1993). With further development of rapid techniques, investigation of 
phosphory lation and monosaccharides by chromatographic or mass spectrometry 
means is likely to become a routine step in the characterisation of post-translational 
modifications of proteins from reference maps. 
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The static of proteome projects 

Many technical aspects of proteome research have alreadv been discussed in .*« 
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FUTURE DIRECTION'S OF PROTONib PROJECTS 

This review has described receni advances in the area of proieome research It has 
illustrated hou new development.' of oldet lechniqurs^Delectrophores^nd amino 
acid analysis ) as well as the applications os new technology i mass spectrometry ) have 
greatly widened the choice of tols the biologist and protein chemist has for the 
separation, identification and analysis of complex mixtures of proteins. This has ma de 
possible the establishment of detailed reference maps for organisms, which ar- 
hecoming the method of choice for th<. definition of tissues or whole cells, and the 
investigation of gene expression therein. 

Proieome projects are already impacting on the docma of molecular biolo«-v that 
DNA sequence constitutes the definition oi an organism. For example, the profeomes 
of different tissues of a single organism are often sicnificantlv different Similarlv 
cross-species identification of proteins (for example the identification of proteins 
from Candida a/means by comparison with 5. cerevisiac) can open up studies on 
organisms that are poorly molecularly defined. As cross-species identification can 
proceed at a pace orders of magnitude faster than a genome project in terms of 
defining the gene and protein complement of orsanims. the need for the DNa 
sequencing of genomes will be avoided, and emphasis.placed on those found to be 
novel. 

Just as genome sequencmg is not an end in itself, neither is an annotated 2-D protein 
reference map of an organism, nor indeed the identification of proteins in a proteome 
So whilst an immediate aim of proieome projects is to screen proteins in reference 
maps, this will lead to expression studies and characterisation of post-tran.slaiional 
modifications. The challenge that then needs to be addressed is ihe investigation of 
structure and function of proteins in a proteome. The magnitude of this is illustrated bv 
the fact that over half the open reading frames identified" in 5. cerevisiac chromosome 
III were innially of no known function (Olivers/.. 1992). Structural and functional 
stud.es w,|| be an undertaking just as formidable as cenome studies are now and 
proieome projects are becom.ng. but will lead to an unimacinahlv detailed under- 
standing of how living organisms are constructed and how thev operate. 
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METHODOLOGY 



Human cellular protein patterns and their link to genome DNA 
sequence data: usefulness of two-dimensional gel 
electrophoresis and microsequencing 



ABSTRACT Analysis of cellular protein patterns bv 
conjputcr-a.ded 2-din.ensional gd electrophoresis together 

11 .' T in P rotein analysis* have 

made possib e the establishment of comprehensive 

,d„?„ n / Dvi ? d P rotein databa "« *at may link pT 
tern and DNA information and that offer a global ap- 
proach to the study of the cell. Using the integrated 2. 
proach offered by 2-dimensional gel Jrotein daSases £ 
«s now possible to reveal phenotype specific protein for 

EST ™ ,t0 ^rvr them ' » $earch foi ? h35 

H .th previously idenufied proteins, to clone the cDNAs 
fufl DF?a P P ro '«n sequence to gene, for which the 
k„L J " <,UCn ^ and chromosome location is 
tZni 3n l ° r tUdy re S" lator y Properties and func- 
tion of groups of proteins that are coordinate^ expressed 
in a g.ven biological process. Human 2-dimen,ionTgel 
protein databases are becoming increasingly important h. 
view of the concerted effort to map and sequent the e ^ 

V," iv V\l ; Honore . B -5 Cesser, B.; Dejijaard K • 

thel . T kh ° VC ' J - HUman CeUu,ar P rotein PE and 
heir hnk to genome DNA sequence data: usefulness of 



A'o /i uma „ . 2-dimensional gel 

. 5W • microsequencing • rZM ^ mn - 

^ " ,/< " 77W " < "' " *™ m ' mapping and " 

D\> E 1 N \ SYNTHES ' ZED FROM ^"nation contained in the 
DNA orchestrate most cellular functions. The total number 

although current estimates range from 3000 to 6000 Of 
these. as ma 70% rform hQuseh ■ 

S3 "riSn? t0 ^ Sh3red b> ' 3,1 CC " typCS "respective" 
h7 T ere , are man >' different ce,) 'VP" in the hu- 
man body w„h perhaps 30.000 to 50,000 proteins expressed 

iV of^T?? 3 Wh ° ,e JUdged from thc fact «hai about 
a ' mill frl^ P V f n ° m , e corres P° nd «» genes. Todav only 
an t leis L ° f the K ,0tal ? c < of protein, has been identified; 
cell , vol! K° Wn ab0m ,he pr0,ein P a " crns of individual 
rnai co P ndi" 0n s. C,r Vamt,0n UndCr and abnor- 

elec^ohL^ 1 I 5 h t gh 2-dimensional gel 

n ne ^ oro^i " ^ ' hC ,echni( ' Ue of choi « <° 
mine the protein composition of a given cell tvDe and for 

monitonng changes in gene activity^ through ^EnriStK 
cnestrate various cellular functions (refs 1-6 and references 



therein). The technique originallv described bv OFarrell « 
separates proteins in terms of their isoelectric point (pi) an 
tTl e , CU a a HT Sh, n USUaJ,y ° ne Ch00ses a conditionof 

Vf rCV ' C J als the ?lobal P ro,ein behavioral 
response as all detected proteins can be analv«d both 
qualitmvdy and quantitatively in relation to each other £ 
present most available 2-dimensional gel techniques (regu- 
lar gel format) can resolve between 1000 and 2000 proteins 
from a g,ven mammalian cell type, a number that cor- 
responds to about 2 million base pairs of coded DNA. Le<> 

$£?J£!Z5£ de "" ed ■* 

Two-dimensional gel ectrophoresis has been widelv applied 
to analyst, of cellular protein patterns from bacteria to mam- 
malian cells (refs 1-6. and references therein). In spite of 
much w 0rk . h r information gathered from these 
stud.es has no, reached the scientific community in its fuT 
ofmeaTr ^ ° f Standardi ^ * systems and £ X 
1 Tn i Sl °?"l 3nd com ™nica,ing protein informa- 
tion Only recently, because of the development of appropri- 
ate computer software (7-13). has it been possible to s£r 

wzhhT r mbCrS ,0 indK idual P">'eins. and store th« 
wealth of information m quantitative and qualitative com. 
prehenHve 2-dimensional gel protein databases (i U-l?) 
..e those containing information about the various proper- 
ties (physical, chemical, biological, biochemical. phvSologi- 
cal. genetic, immunological, architectural, .etc.) of all the 

and U,7nJ } * ^ S '° n J a Sd.Prote.n databases offer an easy 
and standardized medium in which to store and communi- 

whfrh? 1 r" 1 ,nforma, . ion and P^ ide a unique framework in 
£h.ch to focus a muludisciplinary approach to study the cell 
Once a protein is identified in the database, all of the infor- 
mai.or, accumulated can be easily retrieved and made availa- 

exDectenV 6 T l V hC ,0ng ™' P ro,cin databas ~ ^ 
Thft 1 k 3 W * de , Va " Cty of Wological information 

areL of W | mStrumenta, 10 researchers working in many 
J, T °^ ,0,0 5y- amon g others, cancer and oncogene 
studies, d.fferentiation. development, drug development 

c?S£S7? i ;x on - and d,agnos,s of senc,ic and dini - 

»Z hC 3 u Pr ° aCh US .' ng s y s,emi «ic 2-dimensional gel protein 

ven on K S rCCem ' y g3inCd 2 nCW dimension «W» «be ad 
vent of techniques to microsequence major proteins recorded 
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Figure I. Interface between partial protein sequence databases 
comprehensive 2-dimcnsional gel databases, and the human ee- 
nome sequenc.ng project. Appropriate software is required to com- 
pare protein and DNA sequences. In general, although the infer- 
ence ol a protem s sequence from the DNA sequence (thick arrow) 
■s d.rect and unambiguous, the DNA sequence can onlv be inferred 
ipproxtmatelv from the protein sequence (thin arrow/and clonme 
> the gene requires either a cDNA or the requisite group ol- 
igonucleotide probes deduced from the panial amino *id S e 
quence. Modified from ref 6. 



in the databases (refs 24-42 and references therein) Partial 
protein sequences can be used to search for protein identitv 
as well as to prepare specific DNA probes for cloning as-vet- 
uncharacierized proteins (Fig. 1). As these sequences can be 
stored in the database (see for example Fig. 2H), thev offer 
.unique opportunity to link information on protein's with 
he existing or forthcoming DNA sequence daia on the hu- 
man genome (Fig. 1) (20. 36. 39). 

Using the integrated approach offered by comprehensive 
2-d,mens,onaJ gel databases (Fig. 1), it will be possible to 
identify phenotype-specific proteins; microsequence them 
and store the information in the database: search for homoN 
ogy wuh previously characterized proteins; clone the 

the full* D^A? ar,,al Pr ° ,e J n L scqUenCeS 10 S™* for wh ^h 
he full DNA sequence and the chromosome location are 

known, and study the regulatory properties and function of 

groups of proteins (pathways, organelles, etc.) that are coT 

sive a 2 C d'il PreSS 3 S ' Ven bi ° l0gical P r0cess " Comprehen- 
sive 2-dimensiona) gel protein databases will depict an in- 

Sousand^'ofn ° f ^ and ^ 

thousands of protein components of organelles, pathways 

and cytoskeletal systems in both physiological and abnormal 

cond tions and are expected to lead to identification of new 

thT £ " 9 T $ ^ di ^ rer Y CC " ,yP " and ^ganisms. In 
in! ? ' 2- d,mensional S*' Protein databases mav be 

n^. IO ^ °w h ^ WC11 25 10 nalional and international 
specialized databanks on nucleic acid and protein sequences 

h P ;dres S,r etc. lUrCS - NMR e ^ rimenta ' da < a - "mpleS caX 
A few 2-dimensional gel protein databases that are accessible 

orresTn^h™ ^ ^ PUbHshed in CXtenso: the " 
correspond to the protein -gene database of Escherichia coli 

K-12 developed by Neidhardt and colleagues (14. 23) the rat 
Cold f n databaSC « tablished *»• GarrelVand co-worken a 

Snsformed ^ °*« ^ and 3 few human da ' a ° a *« 
VIRC ffih Kr n '?, n , C o , 1 S (15 - 2 °J- normal embrvonal lung 
blonS fibr ° blaSts f 17 ,r ,] kerati "°cytes [19] and peripheral 
blood mononuclear cells [15]) developed in Aarhus. Given 
space hm.tationsand to keep this review in focus we wi! 
concentrate on the computerized analvsis of human ceHuTa 
~JZ Tes^l P :"" nS - " P a " ic "' a ron the steps in 
da abases ,h a ?, ,S ^ n ? Com P rehensiv * 2-dimensional gel 
databases ,ha, can hnk protein and D.N' A information. 



MAKING AND MANAGING A CO\lPRFHFv*,x r 
2-DIMENSIONAL GEL DATABLE OF Hi m>v E 
CELLULAR PROTEINS ^"Aat OF HLMAN 

The first step in making a comprehensive 2-dimen<ionui . 
protein database is to prepare a synthetic imaee"£,nh £ 

w stained gel) to be used as a standard or master reference 
This can be done with laser scanners, charge couple dev cv 

Sn^Ti'mur 6 "- 'if if' 00 Ca ™™ ™atint: drun 
scanners, and multiwire chambers (13). Computerized anal- 

I£^Z r0 : 5P X dCteCti0n - V™™™- Pattern maTch- 
dafabit mak^ L" 5 ^ Md ' e,r ' eva) « "formation. 
(ELSIE 1 *1? r f i t ^ deSCribcd in fhc '"mature 

vfsa« BioS r * ^ Sca,C Bi0l °5>- R o^ille. Md = 
* "age , Biolmage Corporation. Ann Arbor Mich Gemini 

Joyce Loebl, Gateshead: Microscan °000 Tcchnolo^' 

mSTuJS NaS, ? Vi " C ' and MastcrScar^ B^Hericau 

bletvi A «„, 6nU " atcl >' ™« of these systems are incompaij: 

wit I h a ° U ^ V l' 0rk 1 Sta ri 0n in Aarhus - flu °rograms are scanned 
»Hh a Molecular Dynamics laser scanner and the data " re 

EStac 7735 PDQUEST " S0f,W = re (P-«n Sat - 
FC Vp" f om & ° n * spark *™°n computer 4100 
a r-J from SUN Microsvstems, Inc. The scanner 

subst hzi dei^K^^KHhSl'S 

I Gausaan dismbmion to spot centers. Spot intensity is 
calculated as the integration of a fitted Gaussian If cal bra 
uon strips containing individual segments o^ 
amo unl of radioactivity are used, it is possible to meree 
tiple exposures of the sample image in^o a single 32E£«" 
cL^H dy ,? am,C L ange ° ncc thc svnth «ic image II 

monitor 6 S,0rC , d ° n di5k and dh P la y ed *«d» ofthe 
mon.tor. Functions that can be used to edit the imatres in- 
elude.. cance] (f cxanip)e tQ erase • 

been interpreted as spots by the computer; cancel sTreaks or 
ow dpm spots), combine (sometimes a spot mav be Solved 

add sooTto th P A Ckcd Sp ° tS) ' res,0re < ""combine and 
add spot ,o the gel. The process is time consuming -about 
1-1/2 day per image. Edited standard images can be matched 
to other synthetic images, Figure 2A shows a portion of a 

Slmel synthetic image (IEF) of a 
[ S]me«h.on.ne labeled cellular proteins from human AM A 

h blicT St w da J abaSe) (20) Im3 S es can displayed either 
n cl , and W ^' ,C < rCSembli "S original fluorograms) or 

shown in Fig. 2£ each polypeptide is assigned a number bv 
the computer, which facilitates the entr^ and retrieval of 
m t«i C (5S d ?i antitat r 'formation for any" ve^ po ° 

ma icanv hi ?h' S,a " dard m ^ C can ** matc «= d auto- 
matically by the computer to other standard or reference eels 
(Fig. 2C matching of AMA cellular proteins [left] to MRC-? 
proteins [right ) provided a few landmark po are ^ 

manually as reference (indicated with a ♦ ink oA f D m " 
mate the process. s " ' ° In , 



'Abbreviations: CCD. charge couple device- PCW Bm rr 
inece nuclear anticr.n. uoi r* u- i. *"\ UOKC - rv-.NA. prolifcrat- 
raphv. 8 • HPLC - h ' sh P"ft«™ancc liquid chr«mato«- 
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Figure 2. A) Synthetic image of a fraction of an TFF «»i „r .u 

a» R ncd ,o each spot. Q cLp^^^S£ l^ZZT^ " ^ *> As » A bu, showing number, 

Ma.ched protein, are indicated by a ♦ or by J^^^J^^™ 1 * ^ MRC " 5 fibrob, "« ' > ™ P™eins p„« ™ 
ca.egor.es arable in the master AM A database can be ansfer^ed DsSZ* Pr ° ,e ' n r T ' ~n.» iM d in .he' various 

n.nc labeled pro.eins from normal human MRC-5 fibrobfas Th, £„„ •l™? ° f 3 fraC "° n ° fan ,EF ««">"*ram of ["Slmchic,- 

bar, and SV40 transformed MRC-5 (right £^£^^^^^7. ^ f ° f ° f a '~ P™™ in Wis 5, 

f If T ^" USC ann °« a,io " 'or *P°< allows the opcn^lVn^^T^"' 1 ,nf ° rma,, °" und " ^ «.cpon- glycolytic pathway. 
. Reiame Sundance of cytoskele.al and cv.oskeletal-related f «> c ?°nes and mformauon available for a (jiven protein 

- * *« — inforrnaUontt^.™ SV40-,an S ,ormed MRC^brX 
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The automatic matching process that has been described 
in detail by Carrels et al. (13) takes about 5 min. Matched 
proteins are indicated with the same letters in both gels (Fig. 
I . J usefu,ne f s of ^is function is emphasized bv the fact 
that data accumulated on common household- proieins can 
be easily transferred to any other human cellular cell type 
whose 2-d.mensional gel cellular protein pattern is matched 

HI 'MAS- C51LL I AR PROTCiv pjttcijvs 



to our standard AMA 2-dimensional gel protein image. Al- 
ternatively, ,f the standard gel is part of a matchset (set of 
gels in a given experiment) it can be used as a linker gel to 
compare, for example, the quantitative values of a given pro- 
tein throughout the experiment (see Fig. 2D-; levels of some 
proteins in normal and SV40 transformed human MRC-5 
nbroblasts) or with other standard images in different sets of 



cross-matched experiments (18, 22). 
. Once a standard map of a riven protein «mni. j 
one can enter qualitative JL*^££g£> 

formed human amnion ceU (AMA) proteins (20) |£/SS> 
polypeptides of which 2592 correspond to cellular C om2? 
nems, having p F s ranglng f rora F 4 to 13 and moleX 
weights between 8.5 and 230 kDa. Thr m „„ 7k T 
«ins in the database corr^nd to Si acS flfSfSE 
protein; about 90 million molecules pTcdi) 5j£ "S 
lesser abundant of the recorded polypenddes ^2. 

ctKX."' 5000 m0,eCUkS ^ tmTSon 
categories we are using to establish the master AMA XT 

base mdude: 7) protein identification (S^Soi tS 

purified proteins, 2-dimensional immunobS 21 

quencing); 2) amounts (total «»ou»u3^ 0 f , SE* 

sis); 3) subcellular localization (nuclear. c^oSe^ met 

an^ me ^ brane rCCCpt0rS ' S P £dfic ^53S tcT « 
SL^f ,er ' ? P 05 "™* 1 "^ modification (phwJhorVla: 
tion, glycosylation, methylation etc); 6) micros™ n ri«2^ 
cell cycle specificity (specific ^SS^SS^Sli 

rnal and ^ d e^£^ rf jS5™ 

P«hwav /«T2S ° nCOgenes ' ^mponenS of Te" 

pathway (or pathways) that control cell proliferation V 
function (mainly from emigration with Dro,em?«?w ^ 
function); //) , ets of proteins^hat are coordSy 25S 
(hierarchy of controls, differential gene ex™;™. J?^- 
cells, etc.); 12) cDNAs (cloned cDNA?)-7S^^?r^ 

AMA database) displaying' the^nfo * NEPHGE 
gentry 

th ! information ,n that particular entry (Fig. 2F> 
"sine soccific u S '° nal "nmunoblotting 

base of E. col, K-12 nfS a ™£ * S ene -P rotei n data- 
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«■£ rt^^S* 550 a "^ i - f -m labor, 
P-ided by .eSS^ f« 

n^S? * e Infonna »on ^ may haw accumulated on th^ 
V, VAC-cHndonexi^^ ' .ASSESS"? 

2-d5n^!T- nti0 . ned , r io ^ one distinct advantage of 
in J ™ — ^ e,ectro Phoresis is the possibilirTof S md° 

gels that are suitable for computer analv,iW«?eal?na 

STfllSS^ 1 ^ ett) « -J " "llmUa 8 : 

mera ima~! , • ° mei . nced of Vibration strips to 

advanced quantitative studies published so far Sinl r«« 

workers (18. 22). In particular, these investigators haw estab- 
lished a quantitative rat protein database M» 99\VT • j 

SV40 iS^ ,ed * ^'"""ion of ra, REF52SL wiS 
SV40. adenovirus, and the Kirsten murine sarcoma viru" 

t« P „7f • " ng to confluence as well as groups of 

transformation-sensitive proteins that respond iTaSin 
ual fashion to transformation by DNA andRNA viruses^A 
most interesting feature of this quantitative datab^^l 

iar expression patterns as the ce 11 cycle-regulated DNA ™ 

s Se^^ 

.hrough appropSe^o^ 

ng this approach, we have recorded quaTt ativVchSges^n 

Sn,f n y ^ C K Zed by t > u,esccnt . Proliferating, and SVW 
Some d^ a ed humancmbr yo«- lung MRC-5 fibroblasts ^ 
oroTll 1 C ° nCern,n S cytoskeletal and cytoskeletal-related 

those of G? P . reS ^ Cd V S - 2G - ° ur studics ^ weU as 
S defiS 11? c f °- workers (18. 22) may in the long run 

of^ 

SSSase? IMENSIONAL gel protein 

d^S^SS^ therC ^ 0ther 2 - di ™«,ional gel 
aatabases available m computer form that have been pub- 
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TABLE I. Some entries for iipocoriin V in the 
Kmries for lipoconin V (fEF SSP 8?J6 > 
J. Proicin name 

2. Percentage of total protein 

3. Apparent molecular weight (mr) 

4. Isoelectric point (pi) 

V Method (or methods) of identification 

6. Credit to investigators that aided in 

identification 

7. Antibody against protein 

8. Comigration with human proteins 

9. Cellular localization 

1 0. Calcium/phospholipid-depcndcnt 

membrane proteins 

1 1 . Function 

12. Partial amino acid sequence 

13. cDXA sequence 

14. Levels in fetal human tissues 



human AMA 2~dimensional eel protein database 
Information entered 



O.MOfc (about 2.800.000 molecules per cell) 



la. Levels in quiescent, proliferating, and 
transformed MRC-5 fibroblasts 

Iri. Distribution in Triton supernatant and 
cytoskelctons 



33.3 kDa 
4.76 

Microsequencing. 2-dimensional immunoblotting. Conization 

Polyclonal (rabbit, antibody „ 0 . 20). B. Pepinsky. BIOGEN. Cambria 
L.poconi„ V.N.C. Ah.. Howard Hughe, Medical Institu.c. WaMavm t, nivt . rsitv 
Subcortical membrane 
Lipoconin V 

Adrenal glands - + * + : farain 
cerebellum - «. + + ; ear . + + + . cvc . 

lung - + + + ; meninges - + + + ; " 
mcsonephric tissue - + + + • 
striated musde - + ♦ + : pancreas 

.K ' AV ! SpIeen " + + + : s,omach 
submandibular gland - + + + • 

small intestine - «. + + ; lnvmus . + + + . 
thyroid gland - + ♦ + : tongue - + + ♦ • ' 
ureter - + + + 

T/S CCn,) " 1 1: P <P«>«fcniiing) - 1.0: 
T (S\ 40 transformed) - 0.3 

Mainly supernatant 



lished in extenso: these correspond to the £. ccii K-12 
wSf^) (M - 23) '° tHe fat REF52 data " 

The E. coli K-12 cellular protein-gene database is perhaps 
he most complete of all databases reported so far and eveT 
uajy u should trace each protein back to its structural gene 
niorrnat 10n contained in this database includes: we J^. 
iun name protein name. EC number, gene name)- 
rtZT n f S rV SPOt desi ^ atio - (-> coordinated m 

odc ( 'r 0 "" 0 "- ph > s,cal ma P locaiion. Gcnebank 

code, sequence reference, location on Kohara clones)- bi- 

esiduirof'eth" 3 " 0 " (m °! eCU,ar PL ""Sr of 

ac 5 I totaf „ui r T * P Crccnt of 

acid total number of amino acids in a polvpeptide) and 

regulatory information (cellular „r P 0, >.P c P ua 5£ a™ 1 



?r^J° ,e ^ ge " 0mC aS wU as ,he development of im- 
proved methods to express cloned genes. 

abJm ™ REF52 . 2 - di ™««°«al gel protein database lists 

QUEST analysis system (18. 22). Included in this quantita- 
nve database arc 7) protein names (cv.oskeletal 2nd heat 
shock protems as well as various nuclei mitochondrial and 
cyto plasm ns) 2) annotatiQns {subcc „ uIar local f nd 

»on. modification, recognition bv specific antibodies 
Pro P ,dnT a t,0n ' N . H r'ennu»I ^necTc^^SSJS 
»uS ^ n q C ,nl ? rma, 1 ,0 r and ^fcrences to the litera- 
ture) J) protein sets (cytoskeletal proteins, phosphoproteins 
TjZ£T* rcNAifcydin-like propenies. e'tc and 
2 f ^ 2™ ,at,ve da,a ( P ro,cin synthesis during ^nrn-th 
of normal REF52 cells to confluence and quiescence Jn7af 
ter resttmulation of growth-inhibited cells) and af 

so far^he'rl 0 ^ " 2 * dime ? 1 $iona, & Abases mentioned 
tabnfhl?. u ™ SCVeraJ Sma " er ce,,u,ar databases being es- 
•abhshed , n human (normal human diploid fibrobla« fvm- 



,phocytes, leukocytes, leukemic cells) mouse (NIH/3T3 cells, 
T lymphocytes), Aplysia, yeast (Saccharornyees ccmisae), plants 
(wheat, barley, sorghum), and EugUnc Databases of tissue 
protein, (brain, whole mouse. liver) and body fluid proteins 
(plasma proteins, cerebrospinal fluid, urine, and milk) are 
being established in several laboratories. The reader is 
directed to the review by Celis et al. (4) for details and refer- 
ences concerning these databases. 



MICROSEQUENCLVG HAS ADDED A NEW 
DIMENSION TO COMPREHENSIVE 
2-DIMENSIONAL GEL DATABASES: A DIRECT 
LINK BETWEEN PROTEINS AND GENES 

The development of highly sensitive amino acid gas-phase or 
liquid-phase sequenators (24), together with the establish- 
ment of efficient protein and peptide sample preparation 
methods, has opened the possibility to perform a systematic 
sequence analysis of proteins resolved bv 2-dimensional gel 
electrophoresis. Indeed, generated pieces of protein se- 
quences can be used to search for protein identity (compari- 
son with available sequences stored in databanks) as well as 
for preparing specific DNA probes for cloning of as yet un- 
charactenzed proteins (Fig. 1). In addition, partial protein 
sequences can be stored in 2-dimensionaJ gel databases (for 
example, see Fig. 2H) and offer a unique link between pro- 
teins and genes (Fig. 1). 

In the early 1970s gel electrophoresis was used to purify 
proteins for sequencing purposes (reviewed by Weber and 
Osborn in ref 25). Proteins were recovered by diffusion and 
sequenced by the manual dansyl-Edman degradation at the 
nanomole level. This technique was further refined by using 
electro-elution to recover proteins and by miniaturizing the 
system (26). This method has been used extensively, but 
showed increasing drawbacks (low yields, protein samples 
contaminated by free amino acids, and NH 2 -terminal block- 
ing) as the amounts of handled protein gradually became 
smaller (e.g., at the 10 picomol level). 

Most of the problems referred to above have been 
minimized with the introduction of protein-electroblottine 
procedures (27-32). When proteins are blotted on chemi- 
cally inert membranes, it is possible to sequence the immobi- 
lized proteins directly without additional manipulations. 
Thus, depending on the amount of bound protein and its na- 
ture, this direct sequencing procedure generally yields NH 2 - 
terminal sequences containing 10-40 residues. As such, this 
technique was used to identify by their NH r terminaI se- 
quences, differentially expressed major proteins from total 
cellular extracts separated on 2-dimensional gels. A major 
difficulty encountered in this procedure is the occurrence of 
frequent artefactual blockage of the proteins. Several studies 
suggest that this phenomenon is mainlv due to reaction with 
contaminants (particularly unpolvmerized acrylamide 
present in the gel) and to a high dilution of the protein (low 
concentration of the protein per unit membrane surface) In 
addition to this primarily technical problem, many proteins 
are blocked in vivo by acylation or by a pvrrolidon carboxylic 
acid cap. ' 7 

The problem of partial or complete NH 2 -terminal block- 
age can be circumvented by generating internal amino acid 
sequences. This is achieved by fragmenting the protein 
present in the gel (gel in situ cleavage) or by cleaving it while 
/sTlV? * e L membranc (membrane in situ cleavage) 
(JJ-J5). In both cases, proteins are either cleaved in a res- 
tricted way (e.g., by limited enzymatic digestion or by using 
restriction chemical cleavage conditions) or fragmented into 
smaller peptides. 
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Of the different combinations examined, we had eour 
results by using exhaustive proteolytic d«£oi»* on 
membrane-immobilized proteins. This method^ b ^ r 
kf* "JS / P ° nCCaU red -" a ^ proteins on nitrocelluW 
blots (34) for Amido-blacl^stained Immobilon-bound pn 
terns and for fluorescamine-detected proteins on *lass fib, 
membranes (35). The proteases used (trypsin, chvmotrvpsir 
or pepsin) cleave at multiple sites, generating small peptide^ 
that elute from the blot into the digestion buffer from which 
they are purified by reversed-phase high performance liquid 
chromatography (HPLC) before being sequenced individu- 
ally Although each of these manipulations could be expected 
to result in a reduced yield of final sequence information, we 
were surprised that the peptides could be sequenced with 
high efficiency. In our hands, this approach could be rou- 
tinely applied to gel-purified proteins available in amount* 
ranging from 5 to 10 jig, and often vielded sequence informa- 
tion covering more than 30% of the total protein. As 
membrane-immobilized proteins are not homogeneouslv 
digested, but rather show protease sensitivitv next to resis- 
tant regions, the number of peptides generated is much lower 
than expected from the number of potential cleavage sites 
Consequently, HPLC peptide chromatograms are less com- 
plex and most peptides can be recovered in pure form. 

As only limited amounts of a protein mixture can be 
loaded on a 2-dimensional gel. proteins of interest are often 
obtained in yields insufficient for the currently available se- 
quencing technology. More material can be obtained bv en- 
riching for a certain subcellular fraction (purified cell or- 
ganelles) or by exploiting affinity (dyes, metals, drugs, etc) or 
hydrophobic properties of proteins before gel analysis. All of 
the sequencing results accumulated so far in the human pro- 
tein database (20) (a few are shown in Fig. 2/7) have been 
obtained from analysis of protein spots collected from 
2-dimensiona! gels that had been stained with Coomassie 
blue according to standard procedures and dried for storage. 
Proteins are recovered from the collected gel pieces by a 
protem-elution-concentration device, combined with gel 
electrophoresis and electroblotting. Details of this technique 
have been reported in a previous communication (42) and a 
brief outline is given below. 

Combined gel pieces are allowed to .swell in gel sample 
buffer (a total volume of 1.5 ml). The gel/pieces combined 
with the supernatant are then collected into a large slot made 
in a new gel. The slot is further filled with Sephadex G-10 
equilibrated in gel sample buffer. During consecutive gel 
electrophoresis, most of the electrical current passes on the 
side of the slot instead of passing through the slot. This 
results in both a vertical stacking and horizontal contraction 
of the protein band. With this device the protein is efficiently 
eluted from the gel pieces and concentrated from a large 
volume into a narrow spot. The highly concentrated (about 
o mm 2 ) protein spot is then electroblotted on PVDF- 
mcmbranes, stained with Amido black, and in situ digested 
with trypsin. The peptides generated during digestion elute 
from the membrane into the supernatant, and can be sepa- 
rated by narrow bore reversed-phase HPLC and collected in- 
dividually for sequence analysis. 

Using this and previous procedures (37, 39, 42), we have 
so far analyzed 70 protein spots collected from 
2-dimensional gels (20, and unpublished observations) (see 
for example Fig. 2H). The sequence information amounts to 
2100 allocated residues corresponding to an average of 30 
residues per protein spot. So far we have made cDNAs of 
many of the unknown proteins that have been microse- 
quenced, and a substantial number has been cloned and se- 
quenced. All available information indicates that it may be 
possible to obtain partial sequence information from most of 
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the proteins that can be visualized bv Coomassie Brillam 
Blue staining. 

Partial protein sequences are stored in the database as dis- 
played in Fig. 2H. and it should be possible in the near fu- 
ture to interface this information with forthcoming DNA se- 
quence data from the human genome project. In the long 
run. as the human genome sequences become available it 
will be possible to assign partial protein sequences to genes 
:or which the full DNA sequence and chromosomal location 
are known (Fig. 1). 



SUMMARY 

The studies presented in this brief review are intended to 
demonstrate the usefulness of computer-aided 2-dimenstonal 
gel electrophoresis and microsequencing to analvze cellular 
protein patterns, and to link protein and DNA information 
As more information is gathered worldwide, comprehensive 
latabases will depict an integrated picture of the expression 
levels and properties of the thousands of proteins that orches- 
trate most cellular functions. 

Clearly, databases allow easy access to a large body of data 
and provide an efficient medium to communicate stan- 
dardized protein information. In the future, databases will 
foster a wide variety of biological information that can be 
used to support collaborative research projects in basic and 
applied biology as well as in clinical research (2, 5. 46). Once 
a protein is identified in a particular database all the infor- 
nation gathered on it can be made available to the scientist 
However, many problems must be solved before protein 
databases become of general use to the scientific community. 
A most urgent one is to promote standardization of the gel 
running conditions so that data produced in a given labora- 
tory may be used worldwide. Surprisingly, the gel running 
technology as it stands today is still a craftmanship art. 

Finally, comprehensive, computerized databases of pro- 
teins, together with recently developed techniques to 
microsequence proteins, offer a new dimension to the studv 
of genome organization and function (Fig. J). In particular 
human protein databases may become increasinglv impor- 
tant in view of the concerted effort to map and sequence the 
entire human genome. This formidable task is expected to 
dominate biological research in the next decades. 

We would like to thank S. Himmclstrup jergensen for typing the 
manuscript and O. Senderskov for photography Work in the 
authors laboratories was supported by grams from the Danish Bi- 
otechnology Programme, the Danish Cancer Foundation, and the 
Commission of the European Communities. 
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Prepaniion of human tumors for analysis by :-D electrophoresis 1045 

Nonenzymatic extraction of cells from clinical tumor 
material for analysis of gene expression by two- 
dimensional polyacrylamide gel electrophoresis 

We have compared different methods of preparation of malignant cells for 
wo-dimensionai electrophoresis (2-DE). We found ail methods usine fresh 
tissue to be supenor compared to methods using frozen tissue. Our results 
indicate that nonenzymatic methods of preparation of tumor cells, including 
me needle aspiration, scraping and squeezing, have advantages over methods 
using enzymatic extraction of ceils. Nonenzymatic methods are rapid, appear 
to reduce loss of high molecular protein species, and alleviate the necessity of 
S« a m £ and nonviable cells by Percoll gradient centrifugation. Usine 
these techniques high-quality 2-DE maps were derived from tumors of th^ 
lung and breast. In the resulting polypeptide patterns, heat shock proteins, 
non-muscle tropomyosins and intermediate filament were identified. We con- 
clude that nonenzymatic extraction of malignant cells from fresh tumor tissue 
improves the possibilities that these techniques may be useful in clinical diag- 



1 Introduction 

Tumors may develop by a number of different mechan- 
isms in any given cell type. At the time of diagnosis, 
tumors will have progressed along different pathways to* 
various stages of malignancy. To provide a basis for'indi- 
vidual therapy it is of importance to examine specific 
properties of the tumor cell population in each patient. 
A large number of different markers have been de- 
scribed in order to increase the diagnostic accuracy. It is 
likely that a combination of serveral markers is needed 
m the future in order to reflect different properties of 
the tumor. One important method for the resolution of a 
large number of potential markers is two-dimensional 
electrophoresis (2-DE). Extensive efforts are being made 
m identifying various polypeptides separated by 2-DE 
and to characterize how the expression of these polypep- 
tides is affected by the response to cellular transforma- 
tion and various culture conditions [1.2]. It would be of 
value to transfer this information to 2-DE separations of 
polypeptides from tumor tissue samples. However one 
prerequisite is that the quality of the 2-DE gels from 
tumor samples is comparable in quality with 2-DE gels 
from samples of cultured cells. 

Frozen tumor tissues are commonlv used for various bio- 
chemical assessments. However, if such samples are ana- 
lyzed by 2-D polyacrylamide gel electrophoresis (PAGE) 
the polypeptide patterns are obscured by contamination 
of serum- and connective tissue proteins. Such nontu- 
mor-cell-related variations represent serious problems in 
the interpretation and inter-patient comparison of 2-DE 
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patterns [3 J. 2-DE patterns of cells prepared from fresh 
tumor material were analyzed after enzymatic extraction 
of tumor cells [4. 5J or after culturing tumor fragments in 
medium containing radioactive amino acids [6]. These 
procedures may, however, lead to alterations in the gene 
expression/polypeptide patterns. We are onlv aware of 
one study where nonenzymatic extraction of cells from 
fresh tumor tissue (prostate cancer) was used to prepare 
samples for 2-D PAGE [4J. We have examined enzymatic 
extraction and various nonenzymatic preparation tech- 
niques, including fine needle aspiration, for the prepara- 
tion of cells from fresh tumor tissues. We describe 
nonenzymatic extraction procedures that are rapid, lead 
to high-quality 2-DE patterns, and that alleviate the 
necessity to purify tumor cell populations from dead 
cells. 

2 Materials and methods 

2.1 Cell cultures and samples used for spot 
identification 

A rat embryonal fibroblast cell line. WT2 (a kind gift 
from Dr. J. I. Garreis and Dr. S. Pattersson) was used for 
the identification of a number of heat shock and struc- 
tural proteins. Human normal diploid lung fibroblasts, 
WI38. human epithelial breast carcinoma cells, MDA- 
231 and MCF-7 were purchased from ATCC and grown 
as recommended. Polypeptides prepared from a leu- 
kemia type pre-B-ALL were separated by 2-DE. The 
2-DE map was then analyzed by Dr. S. M. Hanash (Uni- 
versity of Michigan, Ann Arbor, USA). 



2.2 Tumor tissues samples 

In this study, 2-DE maps from seven tumors were used 
as representative illustrations: two adenocarcinoma of 
the lung (LA, and LB. mucinous, both cases interme- 
diate grade of differentiation), one sqamous carcinoma 
of the lung (LS), one carcinoid-Iike breast cancer (BC), 
one microfollicular adenoma (highly differentiated) of 
the thyroid (TA), one highly differentiated hyperneph- 
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roma. a tumor of the kidney fKH). and finally one case 
of poorly differentiated corpus carcinoma (CP). 

2.3 Preparation of cultured cells 

The cell monolayers were washed twice in phosphate 
buffered saline (PBS) and then scraped off in ice-cold 
PBS including protease inhibitors (PIH), phenylmeihyl- 
sulfonyl fluoride (PMSF) 0.2 mM and 0.83 mM benzami- 
dine pelleted ai 660 X^,3 min (+4°C) and washed one 
lime before final centrifugation at 2700 X g. 5 min. The 
wet weight of the cell pellet was recorded and the cells 
were stored at — 80°C until further processing. 

2.4 Preparation of tumor tissue samples 

2.4.1 General remarks 

Macroscopically representative and non-necrotic tumor 
tissues were selected within 20 min after resection. 
Parallel samples were routinely prepared for cytology. 
The samples were processed as rapidly as possible on ice 
or at +4"C and in the presence of PIH. Cells were 
stained with DifTQuick (Baxter) and usually examined at 
three different occasions during the preparation proce- 
dure: (i) cytology sample, (ii) extracted cells and (iii) 
cells after percoll gradient centrifugation. 

2.4.2 Specimen acquisition 

The strategy of sample preparation is shown in Fig. 1. 
Tumor tissue cell samples were usually obtained by fine 
needle aspiration (NA) using a 0.7 mm needle. The 
syringe was filled with 1-2 mL of ice-cold culture med- 
ium/PlH. We found that if a tumor appeared to be very 
fibrous it is difficult to extract enough cells for 2-DE 
analysis. In these cases, two alternative techniques were 
examined, (i) The tumor was cut in the middle and the 
fresh surface scraped (SO by a scalpel. The cell-rich 
material was then transferred to ice-cold culture 
medium (L15 with 5% fetal calf serum)/PIH. (ii) A part 
of the tumor sample was placed in culture medium on 
ice for further processing at the laboratory in the fol- 
lowing way: the material was cut into very small frag- 
ments on a pre-cooled dissection plate and transferred 
to a small glass chamber with a 0.7 mm metal net 5 mm 
above the bottom of the chamber. Medium /PIH was 
added to cover the sample (8 mL) which was gently 
squeezed (SQ) towards the net in order to release and 
wash out cells. NA and SC were also compared with an 
enzymatic extraction (EE) procedure described previ- 
ously [5]: Briefly, thin slices of tissue were incubated 
with collagenase (1 mg/mL) and elastase (2 mg/mL) in 
medium for 1 h at 37°C. Extracted cells from every 
sample were then subjected to percoll gradient centrifu- 
gation (Section 3.2.3). 

2.43 Separation of celfs by Percoll gradient 
centrifugation 

The cell suspension was filtered through two nylon mesh 
filters, (i) 250 urn and (ii) 100 urn and then centrifuged 



at 660 X « f or 3 m j n jh e cell pellet was resuspended 
carefully in medium, usini: a syringe and loaded onto t: 
two-step discontinuous Perco!l/PBS gradient. 20.4 
(density = 1.03 g/mLi and 54. 7u o (density = 1.0' g/mLi. 
and centrifuged at 1000 X <: for 15 min. In this system, 
dead cells stay on the top. viable cells sediment 10 the 
interphase and erythrocytes sediment to the bottom. The 
viability of cells in the top fraction and interphase was 
checked by the trypan blue exclusion test. The inter- 
phase cell layer (> 90 °u viability} was collected and 
washed one time in a large volume PBS/PIH (centri- 
fuged at 800 X 1: for 3 min). Finally, the cells were resus- 
pended in 1.4 mL PBS and pelleted at 2700 X - for 5 
min. The wet weight (WWi was recorded and the pellet 
was then stored at -80'C. 

2.4.4 Final preparation of evils for 2-D PAGE analysis 

From this point, cultured cell samples were treated 
in the same way as tumor cell samples: Each cell pellet 
was thawed on ice and resuspended in 1.S9 pL mQ water 
per mg WW (= 1.8°) X WW) pL. The suspension was 
frozen and thawed 4-5 X to break the cells [7). A 
volume of (0.089 X WWi pL 10% sodium dodecyl 
sulfate (SDS). including 33.3 "-■■« mcrcaptoethanol. was 
mixed with the sample and incubated 5 min on ice with 
(0.329 X WW) nL of a solution of DNasc I (0.144 
mg/mL 20 mM Tris-IlCI with 2 nui CACK X 2H ; 0. pH 
8.8) and RNase A (0.0718 mg/mL Tris) |8.9J. The sample 
was frozen and lyophilized. Sample bull'cr [10] including 
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Figure I. Experimental flow chart showing main steps or the prepara- 
tion procedures. The abbreviations used for nonenzymatic extraction 
procedures are: FZ; frozen sample preparation; NA. needle aspira- 
tion: SC. scraped: and SQ. squeezed sample. Extracted cells are then 
loaded as a suspension (top volume of each tube) onto either 
1.07 g/mL Percoll (left I. or a discontinuous Percoll gradient from the 
nonenzymatic extraction (middle), or from enzymatic extraction 
(right). Cellular top- and interphase fractions are then used for 2-DE. 
For details see Section 2. 
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PMSF (0.2 m.M. EDTA (1.0 m.M). 0.5»o Nonidet P-40 
(NP-40). and 3-[3-cholamido propyl >-dimethvlammoniol- 
1-propane sulfonate (CHAPS: 25 mM) was added care- 
fully, mixed for 2.5 h and centrifuged for 15 mm at 



Preparation of human .union for anal,.,, by :-0 elearop no , e41j J 04T 

10000 rpm to remove any insoluble material. Duplicate 
or triplicate samples were taken for protein determina- 
tion [llj. Samples were stored at -80"C prior to isoelec- 
tric focusing (IEF). 
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1AJS Preparation of frozen tumor tissue 

The technique has been described previously [3.12]. 
Briefly, the sample is moaned frozen to a fine powder, 
homogenized, lyophilized and solubilized in sample 
buffer. 

2.4.6 Control of representative 

The tumors were examined routinely by experienced 
pathologists and smears or imprints from the samples 
were also assessed for cytometric DNA content by 
microspectrophotomeiry. 

2.5 2-D PAGE 

2-D PAGE was performed as described [8,10] except for 
the following details. The glass tubes for IEF, 1.2 X 200 
mm. contained 2.0°/o Resolyie, pH 4-8 (BDH) and were 
cast to a height of 180 mm. A stock solution of acryl- 
amide (Serva) and A'.A'-methylenebisacrylamide (16.7:1 
for IEF and 37.5:1 for the second dimension) was deio- 
nized by mixing with 5% w/v Duolite MB 5313 mixed- 
resin ion exchanger (BDH) for 30 min. filtered (with a 
0.22 urn nitrocellulose filter) and stored at -70°C. 
A'.A T '-Methylenebisacrylamide, MA : .A\N'-tetramethyleth- 
ylenediamine (TEMED) and ammonium persulfate were 
purchased from Bio-Rad. IEF tubes were prefocused at 
200 V in 60 min. To each tube a sample corresponding to 
20-40 |ig protein was applied and focused for 14.5 h at 
800 V and finally 1.0 h at 1000 V using a Protean II cell 
(Bio-Rad) and Model 1000/500 Power Supply (Bio-Rad). 
The tube gels were finally extruded into 1.25 mL equili- 
bration buffer, containing 60 mM Tris, pH 6.8 (2% SDS, 
100 mM dithiothreitol and 10% glycerol), frozen on dry 
ice and stored at -70°C. The second dimension (1.0 X 
180 X 90 mm) of the acrylamide concentration was 10% 



T. and the gel contained 376 niM Tris. pH 8.S. and 0.1- 
SDS. IEF gels were applied on top of the slab gel. seaieti 
with 0.5% agarose containing electrophoresis running 
buffer (60 mM Tris-base. 0.2 m glycine and 0.1 l v SDS) 
and electrophoresed with 10-11 mA per gel (constant 
current) at + 10T. Six gels were run together in a Pro- 
tean II xi 2-D Multi-Cell (Bio-Rad). Proteins were visual- 
ized bv silver staining and photographed with the acidic 
side to the left [13J4]. 

2.6 Identification of polypeptides 

Vimentin and vimentin-derived polypeptides were identi- 
fied by extraction of an MDA-231 cell lysate with 0.6 m 
KCl/0.5% NP-40 [15]. Tropomyosins were exctracted 
from MDA-231 and WI38 cell lysates [16]. and cytokera- 
tins were extracted from MDA-231 and MCF-7 cell 
lysates [17]. The patterns were compared with published 
maps [19—21]. Proliferating ceil nuclear antigen (PCNA) 
was identified by immunoblouing (PC 10 mAB. Dako- 
patt) using a semidry system (Multiphor II Nova Blot. 
Pharmacia-LKB Biotechnology AB) and enhanced che- 
moluminescence (ECL) detection (Amcrsham). 

3 Results 

3.1 2— DE of samples prepared from normal and 
tumorigenic cultured cells 

The object of this study was to develop methods for pre- 
paration of 2-DE maps from human tumor tissue which 
have the same high resolution as those obtained from 
cultured cells. Shown in Fig. 2 are high resolution 2-DE 
gels prepared from cultured cells and one leukemia: 
SV40 transformed embryonal rat fibroblasts WT2 (Fig. 
2a); human MDA-231 breast carcinoma cells (Fig. 2b); 
human WI38 fibroblasts (Fig. 2c) and human pre B-ALL 
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Figure 3. 2-DE analysis of a case of lung adenocarcinoma (LA). Comparison of 2-DE gel quality between (Ai frozen and (B) fresh (needle 
aspiration) tissue preparation. 
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cells (Fig. 2d). Polypeptides were identified through a 
laboratory exchange of cell samples/2-DE maps^and 
through 2-DE analysis of purified proteins (Table 1). 

3.2 Preparation of samples from solid tumors 
3.2.1 Fresh versus frozen tissue 

An adenocarcinoma of the lung (LA) was prepared for 
2-DE by conventional methods using frozen material 
(Fig. 3a). There are several possibilities for the poor reso- 
lution using frozen tissue, including the presence of high 
molecular weight protein aggregates. Filtering extracts 
through 0.1 urn filters (Durapore. Millipore) resulted in 
a slightly improved resolution (not shown). When fresh 
tumor tissue from tumor LA was used for sample prepa- 
ration, using fine needle aspiration to collect the cells, 
the resolution was considerably improved (Fig. 3b). The 
use of fresh tissue resulted in a general increase in reso- 
lution, which was most pronounced in the 50-100 kDa 
molecular mass range. A number of differences in the 
protein profiles of the gels in Figs. 3a and 3b can be ob- 
served, some of which are indicated in the figures. The 
decrease in serum albumin in Fig. 3b is likely to result 
from loss of serum proteins occurring when cells were 
pelleted after aspiration. Other differences, such as the 
decreased level of transformation-sensitive tropomyosins 
(TM1-TM3K may result from enrichment of tumor cells 
in the sample of Fig. 3b. Fine needle aspiration, a well- 
established technique in cytology, extracts mainly tumor 
cells because of decreased intercellular adhesiveness of 
neoplastic cells as compared to normal tissue. Micros- 
copic examination of Diff-Quick-stained extracted cells 
from case LA revealed almost 100% tumor cells, 
whereas the whole tissue extract contained approximate- 
ly 60% tumor cells. 
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Figure 2-DE analysis of a case ol" breast carcinoma (BO. Compari: 
heads indicate increased intensity and circles or bracket indicate decre; 
noncnzymaiically i scraped) lissue preparation. 



Table 1. Names and abbreviations for ide ntified spots 

Spot Name Basis for ide ntii-.-jtu'r, 

A Actins a 

aA alpha- Aci\n\T\ a 



B23 


Protein B23 /Numatrin 


a 


EF2 


Elongation factor 2 


a 


EF1 


Elongation factor 1 3 


a 


GT 


Glutathione-S-transpherase {pi 


a 


hsp60 


Heat shock protein 60 


a 


hsp73 


Heat shock protein 73 


a 


hsp80 


Heat shock protein 80. GRP78. BIP 


a 


hsp90 


Heat shock protein 90 


a 


hsplOO Heat shock protein 100. Endoplasmin 


a 


IFa 


Intermediary filament associated 


a 


k8 


Cytokerattn 8 


b and a 


LamB 


Lam in B 


a 


Lipl 


Lipoconin I 


a 


Lip2 


Lipocortin II 


a 


Ltp5 


Lipoconin V 


a 


Mill 


Mitcon 1/3 - Fl ATPase 


a 


Mit2 


Mitcon 2 


a 


Mit3 


Mitcon 3 


a 


MRP 


Mucine Related Polypeptides 




pena 


Ptoliferating cell nuclear antigen 


c and a 


PLC 


Phospholipase C (1) 


a 


RO 


RO/SS-A antigen 


a 


SA 


Serum Albumin 


b and a 


aT 


a/>*fl-Tubulin 


a 


bT 


*er/ifl-Tubulin 


a 


tml 


Non-muscle tropomyosin isoform 1 


b and a 


tm2 


Non-muscle tropomyosin isoferm 2 


b and a 


im3 


Non-muscle tropomyosin isoferm 3 


b and a 


tm4 


Non-muscle tropomyosin isoform 4 


b and a 


tm5 


Non-muscle tropomyosin isoform 5 


b and a 


TPI 


Triose phosphate isom erase 


a 


V 


Vimentin 


b and a 


VidI 


Vimentin derived protein 


b and a 


Vid2 


Vimentin derived protein 


b and a 


Vid3 


Vimentin derived protein 


b and a 


Vid4 


Vimentin derived protein 


b and a 


Vin 


Vinculin 


a 



a. homologous position with respect to other mammalian systems 

b. purified protein(s) 

c. immunobloiting 
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of 2-DE quality and some differences in detected spots (arrow 
I intensity of the same spots) between <A) enzymattcally and (B> 
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322 Comparison of different methods for preparing 
cells from fresh tumor tissue 

Samples were prepared from breast and lung carcinomas 
using either an enzymatic treatment with collagenase/ 
elastase or using nonenzymatic preparations (Fig. 4). A 
number of differences in the protein profiles were ob- 
served in the resulting 2-DE gels, some of which are 
indicated in Figs. 4a and b. These differences include 
both increases and decreases in spot intensity. These dif- 
ferences may result from degradation of high molecular 
weight polypeptides during enzymatic treatment, in- 
creased solubilization of polypeptides, or may have other 
causes. For many tumors, it was only possible to obtain 



small amounts of material since they were reserved for 
other examinations. In these cases, samples could be pre- 
pared for 2-DE using either needle aspiration or 
scraping. Figure 5a shows a 2-DE gel prepared from 
squamous lung carcinoma (LS) cells collected by needle 
aspiration and Fig. 5b shows a gel prepared from the 
same tumor by scraping. In this case, a number of differ- 
ences were recorded between the two procedures, some 
of which are arrowed in Fig. 5. Samples obtained from 
other tumors (breast and lung) generally showed fewer 
differences between these two methods of cell sampling 
(not shown). These data show that different nonenzy- 
matic extraction procedures may yield different polypep- 
tide patterns. However, the number of spots with a large 




II II 



*^ - ^ 

Figure 5. 2-DE analysis of a case of lung cancer <LS). Comparison of 2-DE gc! quality and deiccied spots (arrow heads and circles) between 
(A) aspirated (needle aspiration) and (B) scraped preparations from fresh tissue. 




Figure 6. 2-DE analysis of three other types of tumors, (A) hypernephroma. (B) an adenoma of the thyroid and (O corpus cancer, using the 
nonenzymatic preparation technique. Arrowheads and circles indicate some cytosolic polypeptides. 
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difference in intensity were lower than when a nonenzy- 
maiic preparation was compared with an enzvmatic pre- 
paration. 

2-DE maps of satisfactory quality were prepared by a 
third procedure. Cells were released from small pieces of 
tumor by squeezing (see Section 2). Some examples of 
this are shown in Fig. 6 where 2-DE maps derived from 
a case of hypernephroma. KH (Fie. 6a), a case of thyroid 
tumor. TA (Fig. 6b) and a case of corpus cancer, CP (Fig. 
6c) can be seen. We conclude thai nonenzymatic tech- 
niques are useful for 2-DE analysis of a number of dif- 
ferent tumors. The quality of the resulting gels is com- 
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parable to that obtained using cultured cells (compare 
the gels in Fig. 2 with those in Fig. 4. 6 and 7). Which of 
these methods will be optimal will, in our experience 
depend on the tumor material. For example, verv small 
tumors are preferably extracted by squeezing; on the 
other hand, breast cancers (which are often fibrous) 
yield satisfactory samples using scraping. 

3.2 3 Purification of cells on percoll gradients 

We considered the possible advantage of separating 
viable cells from dead cells, erythrocytes, and debris 
using discontinuous Percoll gradients. Cells collected 
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from the interphase showed a viability of more than 
90% as judged by trypan blue exclusion test. However, it 
as found that the yield of viable cells decreased drama- 
tically if the tissue resection was not immediately pro- 
cessed. To study the effect of lysis of cells during the pre- 
paration procedure, 2-DE maps were prepared from 
nonenzymatically exiracted cells of case LB collected 
from the top fraction (nonviable. Fig. 7a) and interphase 
fraction (viable. Fig. 7b). These 2-DE maps were 
compared with corresponding fractions (nonviable. Fig. 
7c, and viable. Fig. 7d) of enzymaticaliy extracted cells. 
One clear disadvantage of the enzymatic technique was 
that when loss of cell viability occurred during prepara- 
tion, a dramatic loss of high molecular weight polypep- 
tides was observed (Fig. 7c). This was probably due to 
degradation of intracellular proteins. However, nonenzy- 
matic preparations showed fewer differences between 
viable and nonviable cells: The most pronounced altera- 
tion was a decrease of a group of mucine related pro- 
teins (Fig. 7b). We conclude, therefore, that disconti- 
nuous Percoll gradient is necessary after enzymatic 
extraction of cells, but can be omitted from the nonenzy- 
matical tumor sample preparation procedure. 

We used the MDA-231 cell line to study the effects of 
cell lysis and leakage of cytosolic polypeptides during 
sample preparation. Remarkably, after 30, 50. 80 and 140 
min of incubation in PBS/PIH at 0"C, no signilkanl 
changes were observed in the 2-DE pattern (not shown). 
Although loss of cell viability may not result in protein 
degradation when cells are incubated in the presence of 
protease inhibitors, loss of cytosolic proteins would be 
expected during pelleting of cells. We monitored the loss 
of lactate dehydrogenase (LDH) activity into the super- 
natant during incubation in PBS of MDA-231 and MCF- 
7 breast cancer cells at 20 C. In both cases, loss of via- 
bility was paralleled by release of LDH from the cells 
(Fig. 8). After 5 h. 70°o of the MCF-7 cells, but only 30% 
of the MDA-231 cells were dead (not shown). 
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Figure * The relative release i traction in supernatant of total! of lac- 
tate dehydrogenase acitivity iLDH) and cella viability versus incuba- 
tion time of the mammary carcinoma cell lines MDA-231 and MCF-7 
- during incubation in PBS at 20T. 



These data indicate the impact of a rapid preparation 
procedure, at low temperature, of fresh tumor samples- 
Experiments have also been performed usine onlv 
1.07 g/mL Percoll (Fig. 6c and Fig. 1. left test tube) in 
order to remove erythrocytes. One clear advantage with 
this procedure, which today is routinely utilized, is a 
higher yield of viable cells, probably due to decreased 
sample preparation time. 



4 Discussion 

We describe procedures for sample preparation from 
solid tumors for 2-DE. 2-DE maps could be derived 
from solid tumors which were similar in quality to those 
obtained from cultured cells. Compared to methods 
using frozen material, the resolving power of the 2-DE 
technique is increased, allowing examination of a large 
number of polypeptides from tumors of difTerent malig- 
nancies. Other investigators [12.22] have used samples 
from frozen tumors to derive 2-DE maps. We have previ- 
ously described disadvantages encountered using frozen 
tumor samples including variations in contaminating pro- 
teins between difTerent samples (3). The methods de- 
scribed here are based on the preparation of cells from 
tumors without enzymatic digestion. The enzymatic step 
could be avoided since malignant cells usually grow as 
solid masses which are not strongly attached to the 
matrix. Furthermore, we found that omitting the enzy- 
matic digestion alleviated the necessity of purifying 
viable tumor cells on Percoll gradients. This was in sharp 
contrast to enzymaticaliy treated samples, where loss of 
viability leads to loss of high molecular weight proteins 
(Fig. 7c). 

At least in the case of lung cancer, viable and nonviable 
cells showed small differences in respect to 2-DE maps. 
Presumably, protease inhibitors penetrate cells and 
inhibit proteolysis. In model experiments, we observed 
leakage of cytosolic protein (LDH) from the cells in 
parallel to loss of viability. Apparently, however, only a 
limited decrease of the level of low molecular weight 
cytosolic polypeptides was detected using silver staining 
combined with visual inspection. We have found that 
although some tumors are well suited for the prepara- 
tion procedure described, others are not. In general, 
good results were obtained using tumors of the lung, 
breast, corpus and lymphomas. In contrast, cells from 
thyroid adenomas and hypernephroma showed poor via- 
bility. We were in these cases unable to separate nonvi- 
able cells from viable cells, and we can therefore not 
evaluate the consequence of the loss of viability on 
2-DE patterns, apart from a loss of some low molecular 
weight cytosolic polypeptides. 

Highly differentiated tumors may show lower viability as 
compared with poorly differentiated tumors (Dr. Farkas 
Vanky, personal communication). A number of samples 
from thyroid tumors were prepared for 2-DE but most 
cases showed poor viability. We believe that special care 
is needed during preparation of generally highly differen- 
tiated tumor groups. The difference between loss of via- 
bility/leakage of LDH of the more differentiated MCF-7 
cells and the less differentiated MDA-231 cells is in line 
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with these observations (Fig. 8). A number of potential 
and interesting markers, like tropomyosin isoforms. cyto- 
keratins and heat shock proteins, appear to be insensi- 
tive to loss of viability during the preparation procedure. 
We have to date made numerous observations of altera- 
tions in the expression of these polypeptides in breast 
cancers and lung cancers. 

Another problem that may occur, irrespective of sample 
preparation techniques used, is admixture of lympho- 
cytes. These cases are easily detectable in smears and it 
may therefore be possible to select lymphocyte specific 
spots as "internal markers" for the 2-D PAGE analysis. 
Studies using this approach are in progress. Many of the 
polypeptides identified are structural (Table 1). Since the 
expression of many of these polypeptides are known to 
vary between normal and malignant cells, the possibility 
to determine their expression simultaneously is 
appealing. In the specific case of breast cancer, altera- 
tions in the expression of intermediate filament proteins 
(cyiokeratins) are known to occur during tumor progres- 
sion [23]. Other proteins known to be differentially 
expressed between normal ceils and transformed cells 
arc tropomyosins. numatrin/B23. heat shock proteins 
and PCNA. To this end. we have observed alterations in 
the expression of cytokeratin 8. hsp 90. and non-muscle 
tropomyosin isoform 2 during malignant progression. 
(Okuzawa et <//., in preparation and Franzcn et al.. in pre- 
paration). 

The method of choice for sample preparation from 
tumor Lissues will depend on the properties of the tumor 
material studied. It may be important to use only one 
method when comparing cases within one group, as dif- 
ferences were observed between methods. The advan- 
tages of the nonenzymatic techniques arc (i) that it mini- 
mizes contamination with connective tissue, (ii) that 
problems with contamination of serum proteins are 
avoided, and (iii) that separation of viable and dead cells 
is not necessary. Hereby the revolving power of 2-D 
PAGE is maximized for the analysis of human tumors 
and studies on inicr-tumor variations in gene expression 
are facilitated. In addition, the polypeptide patterns ob- 
tained may be more representative for the in vivo tumor 
cell since the use of enzymes and incubations have been 
minimized. 

He would like to ihank Dr. J. I. Garrets. Dr. S. Pattcrsson. 
Dr. S. \L Hunash and Dr. J. E. Celt's for making sample 
and 2-DE map exchanges possible. This study was sup- 
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Reference points for comparisons of two-dimensional 
maps of proteins from different human cell types 
defined in a pH scale where isoelectric points correlate 
with polypeptide compositions 

tSSS nl T £ uciblt - conuner aal *> d nonlinear, wide-range immobilized pH 
gradient (IPG) was used to generate two-dimensional C-Di gel maps of 
L»J,T« t J'° mne - ,ab « ,ed . P"»eins from noncultured. unfractionated normal 

S^S^tST^T-J^ onc proteins - commoR 10 mosl human 

^SZZJ£L^%??n hUman keraunoc > lc 2-D gel proie,n database 
uere identified in the 2-D gel maps and their isoelectric points ip/) were deter- 
mined using narrow-range IPGs. The latter established a pH scale that 
,n theirs 005 be,we " 2 £ ««" ™P* V*™* '"her wK other IPC}* 
Si- iimffiS orw i th e dlffe "»' human protein samples. Of the 41 pro- 
S £fr nfied - *, $ " bSCl 0f 18 was def,ned * suita We to evaluate the corrcla- 
^w^«™n n rt «? CU, ^! d and "P erimentJ »I P/ values for polypeptides w„h 
cui!^ ,^ P ! °° ,7he y an , ance caku, *ted for the discrepancies between cal- 
culated and experimental p/ values for these proteins was 0.001 pH units 

£ i£ i 9 * ind,ca,,n * ,hal were is no significant difference between 
v,Lf h 1 */* ex P enmemaJ P/ v *'"« The precision of the calculated 
values depended on the buffer capacity of the proteins, and on average it 

IKaUon't ™ ^ DUffer eapBd * M $h0Wn *»• ,he *««W? ^ 
b «3S aSn. -ifT" $ ! quence$ cannot ' 0 be «*«n,ed to be sufficient 
lor calculating p/ values because post-translational modifications, in particular 

this studj. 18-20 were found to be -V-ierminally blocked and of these onlv 6 

J^SS^h" ™* '* ^ Pr ° babili, >- of A^taJbto3S,J 

2nd eil£« M £S?TJ f K , hC A terminal « ro »P- Twenty six of the proteiSs 
h!v*»? a i ^ ° ^-terminal ammo acids and of these 17-19 were 

blocked ,n Pr ° te,nS contamin * other ^-terminal groups were 



I Introduction 

As compared with carrier ampholyte isoelectric focusing 
'in ,EF '" lhe ap P ,ii:3,,0n 01 immobilized pH gradients 
(IPGs) in the first dimension in 2-D eel electrophoresis 
olTers improved reproducibility [I] because the nature of 
[he pH gradient makes the resulting focusing positions 
insensitive to the focusing time [21 and to the tvpe of 
sample applied [3J. The recently introduced readv'-made 
IPG strips |4| seem to be an ideal substitute for the car- 
rier ampholyte gradients, which until now have been the 
most commonly used first dimensions in 2-D eel electro- 
phoresis. The availability of standardized first dimen- 
sions opens the possibility of comparing 2-D gel maps of 
various cell types generated in different laboratories, pro- 
vided that the focusing positions of a number of easilv 
recognizable polypeptide spots common to the cell types 
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in question are known. Even though this approach is 
limited to experiments performed with the same standar- 
dized IPG. the flexibility provided by IPGs allows the 
pH gradient to be adjusted to the requirements of a par- 
ticular experiment. 

Exchange and communication of 2-D gel protein data re- 
quires a pH scale that is independent of the particular 
IPG used and by which the results can be described. The 
introduction of carbamylation trains and the relation of 
focusing positions to the spots in these trains repre- 
sented a step forward towards solving the reproducibility 
problem experienced with carrier ampholyte focusing {5J. 
Problems associated with the use of carbamylation trains 
were mainly due to lack of temperature control and to 
the use of nonequilibrium focusing conditions. Accord- 
ingly, the pattern variation involved not only the re- 
sulting pH gradients, but also the relative spot positions 
as related to each other and to spots in the carbamyla- 
tion trains. Even though the question of reproducibility 
has. to a large extent, been solved, the carbamylation 
trains are still not ideal as markers because the spots in 
the trains do not represent defined entities but rather a 
large number of differently carbamylated peptides 
having close p/ values. As a result, the spots are large 
and poorly defined as compared to the ordinary polypep- 
tide spots in 2-D gel maps. 

orj.r,M5/04/i|)04.05j9 S5.00».2«/o 
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Neidhardi etaL (6) defined the pH gradient in 2-D gel 
experiments by pi markers whose pi values were calcu* 
lated from the amino acid composition. Focusing posi- 
tions of other polypeptides could be predicted from their 
composition but the pA' values needed for the pi calcula- 
tions were unknown. Various groups employing this 
approach do not use the same pK values (6. 7] and there- 
fore, the pi values derived in this way cannot be 
expected to describe the variation of the hydrogen ion 
activity. In spite of this fact, it is still possible to make 
approximate predictions of focusing positions because 
the pK values used to define the pH gradient are also 
used to calculate pi values and to predict the focusing 
positions. Errors in pK assignments are therefore com* 
pensated. A pH scale which corretly reflects the variation 
in hydrogen ion activity during focusing should improve 
the precision of the predictions, but this has never been 
implemented with CA-IEF focusing as a first dimension 
in 2-D gel electrophoresis. The main reason for this are 
the problems associated with pH measurements in 
focused gels containing high concentrations of urea. 

IPGs can be described from the concentration variation 
of the immobilized groups, provided that the pK values 
of these groups are known for the conditions prevailing 
during focusing. To avoid measurements on gels, Gia- 
nazza eial. [8] suggested the use of pK values derived by 
addition of determined pA' shifts. Recently, direct deter- 
minations of pK differences between immobilized 
groups in IPGs were made by determining phpK values 
in overlapping narrow. range IPGs [9, 10] and the results 
verified the applicability of the Gianazza approach. A 
description of the focusing results in a pH scale, which 
correctly describes the variation of the hydrogen ion 
activity for the focusing conditions used, not only allows 
the comparison of 2-D gel maps generated with different 
IPGs, but also opens the possibility for correlating the 
focusing position of a polypeptide with its composition 
19]. Experiments by Bjellqvist etaL [9. 10] have implied 
that pH scales showing good correlation between calcu- 
lated and experimental pi values can be derived for any 
of the conditions commonly used for focusing in connec- 
tion with 2-D gel electrophoresis. These pH scales are 
then defined through the pK values of the immobilized 
groups in the IPG containing gel. To be useful for inter- 
laboratory comparisons, however, the pH scale has to be 
defined through pi values of easily recognizable spots 
present in the 2-D gel map. So far. pi determinations in 
a useful pH scale, combined with determinations of pK 
values needed for pi calculations, have only been made 
for the pH range 4.5-6.5 at 10°C [9]. CA-IEF focusing as 
described by OTarrell [11] does not control the tempera- 
ture of the first dimension, which can be expected to be 
slightly above room temperature. With IPGs, the temper- 
ature commonly used is about 20°C {4, 12] or 25 # C f!3] 
and this is a critical parameter that needs to be con- 
trolled [14]. 

The present work was designed to compare 2-D gel maps 
of different cell types in a laboratory applying both 
ca-ief and IPG focusing at a common temperature To 
this end we have generated 2-D gel maps of proteins 
from noncultured, unfractionated normal human epi- 
dermal keratinocytes with IPG in the first dimension 



and a focusing temperature of 25 C We have u $~* -or* 
merctal nonlinear, wide-range IPG strips which gi^-D 
gel maps that are closely similar to the ones resuiiin.- 
with the CA-IEF technique used to establish the hunu~ 
keratinocyte database fl5). As an initial step towards 
imerlaboratorv comparisons of results obtained w ith the 
nonlinear gradient as a first dimension we report her? 
on the focusing positions of 41 known proteins that are 
common to most human cell types. The pH range 
covered corresponds to the range in classical CA-IEF 
2-D gel electrophoresis and in order to use these pro- 
teins as internal standards for comparing 2-D eel maps 
generated with other IPGs we determined their pi values 
with narrow-range IPGs in the first dimension. We have 
compared the calculated versus experimental pi values 
and show that it is necessary to have further information 
(absence or presence and nature of posttranslational 
modifications), in addition to amino acid composition to 
be able to calculate pi values that correspond to the 
actual experimental values. The pA' values used for the 
calculations are provided and the usefulness of p/ predic- 
tion in relation to database information is discussed. 
Furthermore, we comment on the possibilitv of using 
experimentally determined pi values to verify the avail- 
able database information on polypeptide composition. 



2 Materials and methods 
2.1 Apparatus and chemicals 

Equipment for isoelectric focusing and horizontal SDS 
electrophoresis (Muliiphor* II electrophoresis chamber 
Immobiline* strip tray. Multidrive XL programmable* 
power supply. Macrodrive power supplv and Muttitemp* 
II) was from Pharmacia LKB Biotechnology AB 
(Uppsala. Sweden). Vertical second-dimensional gels 
were run in the home-made equipment described in 1 151. 
The IPG strips with the wide-range nonlinear pH gra- 
dient were either Immobiline DrvStrip v pH 3-10 NL 
180 mm or alternatively 160 mm long IPG strips with a 
corresponding pH gradient. In both cases the IPG strips 
were delivered by Pharmacia LKB. Immobiline. Pharma- 
lyte. Ampholine. GelBond as well as PAG film and the 
ready-made horizontal SDS gels (ExcelGeP XL SDS 
12-14) were also from Pharmacia LKB. Purified proteins 
and peptides were from Sigma (St. Louis. MO). 

1*2 Sample preparation 

Preparation and labeling of unfractionated keratinocytes 
as well as fibroblasts have been described in (16). Cells 
were lysed in a solution containing 9.8 m urea, 2% w/v 
NP-40, 100 mM DTT and 2% v/v Ampholine pH 7-9. 

23 2-D gel electrophoresis 

First-dimensional focusing was performed according to 
Gorg etaL (2) with some minor modifications, as de- 
scribed in [9]. Rehydration of the IPG strips was made 
in a solution containing 9.8 m urea, 2% w/v CHAPS, 10 
mM DTT and 2% v/v carrier ampholyte mixture. The car- 
rier ampholyte mixture consisted of 2 pans Pharmalyte 



4-6.5. 1 pan Ampholine pH 6-8 and 1 pan PharmaJyte 
pH 8-10.5. Usually, cathodic sample application was 
used and ihe samples were diluted 2-20 times in a solu- 
tion containing 9.8 m urea. 4*o w/v CHAPS. l c e w/v 
DTT and 35 mw Tris base. For acidic application, the 
Tris-base was substituted with 100 m\t acetic acid. The 
degree of dilution and sample volume (20-100 uLi 
depended on the particular sample and the IPG. and 
whether visualization of the proteins was to be done by 
Coomassie Brilliant Blue or silver staining. With the 
wide-range non-linear IPG. 10-30 ug of total protein 
was loaded for silver staining and 100-200 ug for Coo- 
massie staining. Focusing was done overnight with Vh 
products in the range of 45-60 kVh with 160 mm long 
strips and 50-70 kVh with 180 mm lone strips. Solubili- 
zation oi polypeptides and blocking of -SH groups prior 
to the second-dimensional run. as well as loading on the 
second-dimensional gel was done as described in [9]. 
The stacking gel was omitted and 5-10 mm were left at 
the top of the second-dimensional gel for applving the 
IPG smp. The space was filled with electrode buffer con- 
taining 0.5 H w/v agarose. Casting, running, staining and 
autoradiography were carried out as described in [15J. 

2.4 Experimental determination of pi values 

The determination of ihe pA' differences between Immo- 
hihnci pA* 4.6. pA" 6.2 and pA* ".0 necessarv for the cali- 
bration of the pH scale at 25 C in 9.8 m urea was done 
us described in |9J with the same narrow-ranee IPGs 
The pH scale was defined by setting the pA* value of 
Immobilinc pA' 4.6 equal to 4.61 [9J and the determined 
PA differences cave the pA* values of Immobilines pA'6.2 
and pA ".0. equal to 5.73 and 6.54. respectively. The pA' 
diflcrences lound arc in good agreement with values de- 
rived Irom |I7| und [8] by extrapolation to 9.8 m urea 
concentration. As in [9J. additional narrou-ranue recipes 
h;i\c nc-n used for determining p/ values. With narrow- 
range IPGs extending to pH values higher than the pA* 
value ot Immobiline pA* ^.0. anodic sample application 
u;»s u »ed uuh acetic acid added to the sample solution. 
Otherwise, cathodic sample application was used with 
the vimc sample buffer as for wide-range IPGs. 

2.5 Protein compositions used for p/ calculations 

With the exception of vimcntin. protein compositions 
arc imm the Swiss-Prot database [I8J. For vimemin. we 
used ihe data from 119). where the amino acid at posi- 
non -1 is a D instead of a S. Information in the Swiss- 
Prot dataoase on phosphorylation has been disregarded 
because it was known from earlier studies (J. E Celis 
unpuohshed results) that the spots in question corre- 
spond to the unphosphorylated forms of the peptides 
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different substituems on ihe c-carbon were taken m- > 
account. The calculations of p/ values were made u,^ 
the aid of the IPG-maker program [20J. 

17 pK values used for pi calculations 

For the carboxyl terminal group and internal glutamvi 
and aspanyl residues the same pA' values were used as in 
[9]. For C-terminal gJutamyl and aspam! residues, sep- 
arate pK values were derived with the aid of the Tan 
equations [9. 21). The pA' values of histidyl groups w ere 
calculated from the pi values of human carbonic anhv- 
drase I as in [9]. For A-terminal glycine a pA' value of 
7 50 was used. The pA* shift caused by a substituent on 
the o-carbon was assumed to be identical with the pA 
shift the substituent caused for the amino group in the 
amino acid. Le. 2.28 pH units were subtracted from the 
pA' values for the amino groups in the amino acids eiven 
in [21 23], The approximate pA* value of 9 for the cvs- 
tenyl group was taken from [24]. For tvrosvl and arginvl 
groups we used the pA' values for the amino acids [21 
23]. For lysyl groups the effect of high urea concentra- 
tion on amino groups was taken into account and 0.5 pH 
units were subtracted from the amino acid pA* value. 
These last three pA' values are far from the pH range 
under study and the results found would have been the 
same if lysyl and arginvl groups were assumed to be 
fully ionized while the ionization of tyrosvl groups were 
neglected. A complete list of the pA* values used is eiven 
in Table 1. 
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2.6 Calculation of pi values 

For ihe p/ calculations it was assumed that the same pA' 
value could be used for an amino acid residue in all 
polypeptides and in all positions in the peptide except 
ior \. or C-terminally placed amino acids. For the pA' 
values ot the A-icrminal amino groups the effect of the 



2.8 Statistical analysis 

Statistical comparisons of the experimental and calcu* 
lated pi values were done on an Apple Macintosh list 
using the statistical package Statisiica/Mac. release 3.0b 
(from StatSoft Inc.. Tulsa, Oklahoma). Calculated and 
experimental pi values were compared by the Mest for 
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correlated samples paired /-test). The nonnalitv of 0/ 
differences was estimated graphically 0 v probabilitv 
plou. The variances of the data presented here and the 
similar data on plasma and liver proteins in f91 were 
compared by the F-test. J 

3 Results and discussion 

3.1 Identification of polypeptides and pi determinations 

The 2-D gel maps of ("SJmethionine-labeled proteins 
from noncuitursd. unfractionated normal human kerati- 
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nocytes^ focused u.th the nonl.near. i Pt ; - • 

CA-IEF pH gradients in the first aimens.or ^ 7": 

« 1 Figs. I and 2. respectively. The IPC 

PH values but otherwise the two patterns are v S -7 ;.;. 

liar and most of the spots in the IPG pattern m- - ■■• 
direcrjy related to the corresponding spou tn " ^ 
CA-itF gel. To obtain comparable patterns 11 was imrv-- 
tant to keep the focusing temperature as similar a- 
possible. Compared to other studies (1-4. 9. 10. 12- Uj 
we increased the urea concentration in the focusing csl 
to 9.8 m because keratins streaked badly in the focusm* 
dimension when 8 m urea was used, presumablv due to 
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aggregates of acidic and basic keratins. An increase in 
urea concentration to 9 m or more eliminated these 
streaks: apart from this effect, no other major chances in 
the focusing positions were observed. In Fig. 1 we" have 
indicated the positions of 4] known proteins from the 
human keratinocyte 2-D gel database that are most 
likely common to most human eel! types. The choice 
was made because these proteins are" easy to identify 
with certainty. With the exception of stratifin (spot 2) 
involucnn (spot 4> and keratin 14 (spot 15). which are all 
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eptthehal markers, these proteins are also presen: , r 
human fibroMasts (Fig. 3) and lymphocytes (results no- 
shown) and therefore can be used as landmarks for con- 
paring 2-D gel maps derived from different cell tvpes In 
Table 2 the 41 proteins are listed together with ihe- 
sample spot numbers (SSP) in the human keratinoevte 
protein database and p/ values determined in 2-D gel 
maps generated with narrow-range IPGs in the first 
dimension. 
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32 Comparison between the determined and calculated 
p/ values for human keratinocyie proteins 

Thiny six of the 41 proteins listed in Table 2 are found 
in the Suiss-Prot database. Contrary to the plasma and 
liver proteins used in [9], the p/ cavitations on the pro- 
teins used m this study posed some problems that 
reflected the way in which they were characterized. The 
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dS smSv kn ° Wn - ^ pro,eins u "< » 

this stud} have all been characterized bv internal 

sequencing [7] and is known that A-termina! ace™. 

lion occurs with high frequency in eukaryoies. 
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According to Brown and Robert [25J. proteins with acety- 
lated .V-terminals correspond in weight to approximately 
80% of the soluble protein in ascites cells. Based on 
results from A-terminal sequencing, at least 40*o of the 
spots in the human liver protein 2-D gel map appear to 
be blocked [3]. The corresponding number, derived from 
107 spots in the 2-D gel map of human T-lvmphocyte 
proteins, falls between 60 and 65% (J. Strahler. personal 
communication). Information concerning A^terminal 
blockage is not normally available, and in the Swiss-Prot 
database only 6 of the 36 keratinocyte proteins are speci- 
fied as A -terminally blocked. We have, within the present 
material, defined 18 proteins for which the ^terminals 
are very likely to be correctly described. Six of these pro- 
teins are listed in the Swiss-Prot database as A-termi- 
nally blocked, four represent proteins which appear in 
the human liver 2-D gel map and have been A-termi- 
nally sequenced as liver proteins [3) and the remaining 
eight have A-terminal groups other than M. S and A. Le. 
V-terminals for which A-acetyiation is uncommon [261. 
In Figs. 4 A. B. C and D pi values calculated from Swiss 
Proi database information are plotted against the experi* 



mentally determined p/ values for all the le-ar-v--. 
proteins listed in Table 2 and for the IS stit^c 
terns, as well as for the plasma and liver protein* ,i • 
from [9] valid for 10 *C)-. 

The calculations show that without knowledge of the 
status of the A'-terminal group, precise predictions of p/ 
values for eukaryotic proteins cannot be achieved based 
on the information available in Swiss-Prot and similar 
databases. However, for proteins where the A-terminal 
status is known, we find good correlation between pre- 
dicted and experimental p/ values. When the variance of 
the p/ discrepancies and the variance of calculated 
charges at the experimental p/ values derived from the 
present data set are compared with the corresponding 



There are lour plots: , A i the 5c polypepndcs from nornul hunun 
keratinocytes tno corrections). (B) the 3o polypeptides from Fi*. 4 a 
where p/ values have been recalculated for 12 polypeptides »uh M. 
5 and A as V-iermmally assumed Mocked, based on calculated 
charge. iCl the IS selected polypeptides with information on the 
A-terminal configuration, and iD> plasma and liver protein*. 
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•alues derived from the data on plasma and liver pro- 
:eins in [9] (Table 3). the present data are found to result 
:n larger variances for the values of both p/ discrepancies 
and calculated charge at the experimental p/ value when 
no information on posttranslational modification is 
:aken into consideration. Correction for possible A'-acety- 
:ation of 12 polypeptides with M. S and A as .V-ierminal 
results in a smaller variance of pi discrepancies, al- 
though not significantly different from values derived 
:rom (9). whereas the variance of the calculated charge at 
:he experimental pi value is significantly higher. For the 
18 selected proteins the variance for the pi discrepancies 
s significantly smaller than for the data in (9J; however. 
:he corresponding value for calculated charge at the* 
ixperimental pi value does not improve to the same 
extent. This, we believe, reflects another difference 
between the two sets of proteins used for the calcula- 
ions. Based on spot distributions in 2-D gel maps, the 
;et of proteins used here has a molecular weight distri- 
bution that is more representative of the patterns ob- 
served in mammalian cells. In the study by Bjellqvist 
?*al. [9] most of the high molecular weight plasma pro- 
ems had to be excluded due to their unknown content 
>f sialic acid which made the proteins analyzed in this 
;tudy heavily biased towards low molecular weight pro- 
ems. The buffer capacity of proteins normally increases 
vith the protein's molecular weight, and the average 
JufTer capacity of the presently selected proteins with 
issumed known .V-ierminals is 18 charge units/pH unit, 
vhile the corresponding value for the proteins used in 
9J is only 9 charge units/pH unit. High buffer capacitv 
:an be expected to improve the agreement between cai- 
:ulated and experimental pi values. Inspection of the 
Jata presented in Table 2 for the polvpeptides with 
issumed known .V-iermmals verifies the "importance of 
he buffer capacity. For 8 polypeptides having buffer 
-apacities higher than 15 charge units/pH unit, the calcu- 
aiions in all cases yielded pi discrepancies with absolute 
alucs ol less than 0.02 pH units. The largest discre- 
pancy. 0.06 pH units, was observed for annexin II and 
tathmm. proteins which have lou buffer capacity: 0 9 
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and 6.6 charge units/pH unit, respectively The rro-- 
bility that the focusing position of a protein with knoun 
composition will fall within a certain distance from :he 
calculated pi value therefore cannot be predicted bv the 
variance alone. The buffer capacity of the specific protein 
must be taken into consideration as well, As indicated 
by the decrease of the variance of calculated charges at 
the experimental pi value for the selected proteins, the 
observed improvement can not solely be due to the 
higher buffer capacity of the keratinoevte proteins. The 
two studies relate to different experimental conditions. 
Good agreement between experimental and calculated 
pi values implies that the proteins are defolded and a 
factor that may contribute to the observed improvement 
is a more complete defolding of proteins caused bv ihe 
higher temperature and urea concentration used in this 
study. 

The data indicated that the precision with which pi 
values can be predicted for polypeptides with hieh buffer 
capacity is better than the precision with which experi- 
mental pi values can be determined. If the pH is defined 
through the pA' values of the immobilized groups in the 
IPG containing gel. the precision of the experimentally 
calculated data will depend on the pH difference 
between the pi and the pA' value of the immobilized 
group with the closest pA' For the present studv this will 
give pi determinations with a precision varvine in the 
range of ± 0.02-0.05 pH units [9J. The good agreement 
observed between the calculated and experimental pi 
values is due to the fact that errors are mainlv system- 
atic and. as discussed in [9J t they will largely be cancelled 
out in the calculations. A pH scale defined through the 
presently determined pi values will not necessarily 
reflect the variation of the hydrogen ion activitv during 
the focusing step in an optimal way. but it still allows 
precise predictions of focusing positions for polypeptides 
with known compositions, including information on 
posttranslational modifications. Calculated net charge at 
the experimentally found isoelectric point defined in this 
scale will serve as a tool to verify that the polypeptide 



* ^rt^JT JnCC> '° r <1 " ,CrCnCC ' "-""uuied P/> .n PH „„„s a„< ca.eu.a.c* ch^es a, ,he exper.mema. ,/ 



Pljimj jnd liter 

protein* 
(8 m urea. I0'*C) 



Keratinocyte proteins 
(9.8 m urea. 25 w O 



All peptides 



'urr.fr*: ci prote ins 

.xpenrnrnuJ p/- 
ji:uiji-;c p/ 

-vjiuc ip/ discrepancy)" 
•lf\el ip/ discrepancy! 5 ' 
.'jicuuied charge at the 
xpsnmtnul p/ value 
•wluc (calculated charge 
: :hs experimental p/ valuer" 
•level (calculated charge 
: the experimental p/ value i*' 



29 



36 



Mean 
-O.Oll 



-0.070 



All .peptides alter 
correction for 
v-acetylation 

"IT 



Variance 
0.005 



0.22? 



Mean 

0.0T2 



Variance 

0.0 r 



Mean 
0.019 



Variance 
0.003 



1 

0.5 



3.4 
0.0005 
0J21 0.871 

0.0002 



1.67 
0.0721 
0.009 0.444 

1.96 

0.0338 



Known .V-ierminal 
configuration tor 
very likely configuration) 

Mean Variance 
0.005 0.001 

5 

0.0004 
-0-014 0.109 

2.08 

0.0536 



» Comparison 10 the data in |9|. F - S.'-/ S ^ where Sr is 
> 2> Avaiuei. where r. and i ; ire the degrees 



the larger of the two variances 
of freedom for *i and j 2 . respectively 



538 B. BjtUqvtft « mL 



composition used in the calculation is correct and com* 
plete. Exceptions to this are proteins such as invoJucrin 
and heat shock protein 90 that have very high buffer 
capacities. Introduction of an extra charge unit into 
these proteins will only result in pi shifts falling in the 
range of 0.01-0.02 pH units and the effect is that the 
quality of the pH definition - the precision by which pA' 
values used in the calculations are given and the preci- 
sion of experimental pi values in these cases - will limit 
the possibilities to verify polypeptide compostion based 
on the experimental pi value. 

Statistical comparison of experimental and calculated pi 
values was done using the /-test for dependent samples 
and normality of the discrepancies was estimated bv 
probability plots. For the 36 proteins, the p-level is 
0.0021. indicating that a result like this is unlikely to 
be a chance effect and must be assumed to represent a 
real difference. After correction for the most likel\ 
A-terminal configuration, the p-level is 0.043 and cannot 
be accepted as representing the same population since 
the p-level is less than 0.05 - the traditional p-limit of 
statistical significance. For the 18 proteins with a known 
or very likely A'-ierminal configuration the /-test gave a 
p-level of 0.49, which verifies that the experimental and 
calculated pi values are not significantly different. 

Besides showing that pi values for denatured proteins 
wiih known compositions can be calculated with a high 
degree of precision from average pA* values, the results 
also provide strong support for the notion that 
A r -terminal blockage heavily depends on the nature of 
the A-terminal groups [26]. The results seem to indicate 
that with A'-terminals other than M. S and A. onlv a few 
proteins have blocked A-terminals (1 out of 10 proteins 
in the present study), while it can be inferred from the 
data presented in Table 2 that a majority of the proteins 
wiih M. S and A as A-terminal are blocked. After correc- 
tion for the effect of suspected A-terminal blockage 
there is only one protein (nucleolar protein B23) out of 
the 36 used in this study, which, in spite of a high buffer 
capacity, has a marked difference of 0.11 pH units 
between predicted and deiermined pi values (Fig. 4B); 
this corresponds to 3 charge units due to the high buffer 
capacity of this protein. This discrepancy in pi prediction 
and calculation of net charge at the pi is probablv not 
due to deficiencies in the database information but 
instead reflects a shortcoming of the model used for pi 
calculations. Nucleolar protein B23 contains a domain 
extremely rich in aspanic and glutamic acid residues 
(Table 4). in which 26 out of 28 amino acid residues 
from position 161 to 188 are either a D or an E. A calcu- 
lation based on the use of average pA' values unin- 
fluenced by the charged neighboring amino acid resi- 
dues cannot be expected to correctlv describe the pi 
value with almost half of the acidic groups packed 
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together into a highly negatively charced re~o- n. 
limitation caused by calculations based on aU— 
values does not severely limit the usefulness o7 - 
approach since a search through Swiss-Prot snow* 
this type of D/E-rich motif is uncommon* and :nc evv 
tence of a highly charged region is immediately arparsr.: 
upon inspection of the ammo acid sequence. 

The quality of the information available in databases 
especially concerning postradiational modifications. i$ 
a major problem when the data is to be used for pi pre- 
dictions. The p-level of 0.043 found for all 36 proteins 
after correction for .Y-aceiylation. shows that this prob- 
lem is not only limited to A-terminal blockage and the 
very good agreement found lor the eighteen pirn pep- 
tides, with assuming))- correctly described A-terminal 
(Fig. 4C). must be regarded as an exception from this 
point of view. .V-Terminal blockage is generallv ihc main 
problem in relation to pi predictions for eukanotic pro- 
teins. Of the 36 keratinocyte proteins analvzed. IS— 20 
are suspected to be A-terminally blocked it proteins blo- 
cked according to Swiss-Prot. 12 proteins with M. S or A 
as A-terminal and assumingly blocked based on the cal- 
culated charge, and two proteins, involucrin and 
nucleolar protein B23. with M as A-terminal for which 
the data does not allow any conclusion). This is in rea- 
sonable agreement with the conclusions based on the 
/V-terminal sequencing data derived in connection with 
2-D gel electrophoresis. A-terminal blockage can be sus- 
pected for 17-19 of the 26 proteins with M. S or A as 
A-terminal. while only 1 in 10 proteins with other 
A-terminal groups are blocked. The information that the 
frequency of A-terminal blockage is strongly related to 
the nature of the A-terminal group will be of some help 
in connection with pi predictions based on database 
information. However, without information from other 
sources, an uncertainty will always remain as to whether 
the A-terminal charge should be included in the p/ calcu- 
lation. 



4 Concluding remarks 

The data presented here lays the foundation for com- 
paring 2-D gel protein maps of different cell types gener- 
ated with nonlinear, wide-range IPGs in the first dimen- 
sion. The focusing positions of 41 polypeptides common 
to mosi human cell types have been described in a pH 
scale thai allows focusing positions to be predicted with 
a high degree of accuracy, provided that the composition 
of the polypeptides are known and that information on 
posttranslational modifications are available. For poly- 
peptides with a very high bufTer capacity, the limiting 
factor is the precision with which experimental pH 
values can be determined rather than the precision of 
the calculations. Possible deficiencies in the pH scale 
description of the variation of the hydrogen ion activity 
has. at least at the present state, no consequences for its 
practical use. The major limitation in connection with 
predictions of focusing positions from polypeptide com- 
positions is the quality of existing data on protein com- 
positions, especially concerning posttranslational modifi- 
cations. Amino acid sequences have been reasonably 
easy to obtain, while posttranslational modifications 
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have been difficult and work-intensive to determine. 
Recent developments in the field of mass spectrometry 
are fast changing this situation and within the nexi years 
we can expect a surge in reliable data in this area. While 
awaiting this development, verification of correctness 
and completeness of available information oh polypep- 
tide composition can be provided by experimental p/ 
values in a pH scale based on the pi values determined 
in this study. So far. our data cover the pH range below 
pH « 7.5. The basic pH range covered by NEPHGE as 
first dimension will be covered in forthcoming work. 

Received Decenoer :9. 1993 



5 References 

[1| Gianazza. E.. Astrua-Teston. S.. Caccia. P.. Giacon. P.. Quaflia. L.. 

R if hem. P. G.. Eieampnorests 198e>. *. *6-83. 
\2\ Gorg. A.. Posiei. W.. Gumner. S.. EUctmnhnresis 1988. 53 1-546. 
[31 Hocnsirasscr. D. F.. Fruitfcr. S.. Paoue:. N.. Bairoch. A." Ravier! 

F.. Pasoual;. C Sancnez. J.-C. Ti>so:. J -D.. Biellqvisi. B.. Vartas. 

R-. Appel. R. D.. Hughe*. G. J.. Eicctntphnrests 1992 /* 992- 

1001 

|4| htimnhthnt DnStnn Ktr *»t _\/) Eieftropnnrettx- Instructions. Phar- 
macia LKB Bioiechnoloc> \B. I'ppsala 1993. 

\5] Anderson. V L. Hickman. B. J.. Anai. Bmvnrm. 1979. vA 312-320. 

ft»| Ncidhardi. F. C. AppieD>. D A.. Sankar. P.. Hutton. M. E.. Phil- 
lip*. T. A.. Eivt-tntfttmn-xn /". llp-12!. 

I"i Rj»musicn. H. H.. Damme. J V. Pu>pe. M.. Gesser. B.. Celis. 
J. \andekerckho\c. J.. Ei%\-f*inn»rextx 1992. U. *»60-969. 

[S| Gianazza. E.. Anonj. G.. Riehem. P G.. Ehttmptmrests 1983 4 
321-326. * # 

|9| lliellgvisi. B.. Hughes. G. J.. Pj>g U jli. C. Pauuci. N.. Ravier. F.. 
Sanchez. J.-C. Fruiifer. S.. Hocnurasscr. D. F.. Ehxtrophnrests 
IW. 14. 1023-1031. 



[10] Biellqvm. B.. Pasquali. C. Rawer. C. Sar.cne; ) -C H 

strasse:. D. F.. Electrophoresis 1995. A/. l35--i* P « 
II I J OTarreH. P. H.. / Sio/. Cftm. Iff. .Vft iOO'-^o:: 
(121 Gor*. A.. Btoehem. Soc. Transactions 1993. .V. i3i— 1*2 
[131 Hanash. 5. M.. Sirahler. J. R.. XeeL J- V.. Haiia;. N . Mainer.. R 

K*«n. D.. Zhu. X. X.. Waener. D.. Gap e. D. A.. Wjison. J. T.. 

.W. .4m*. S«. £*JH 1991. 55. r09-5*13. 
114) Gorp a.. Posiei. W.. Fnedncn. C. Kuick. R.. Strahler. J R . 

Haaasn. S. M.. Electropnorests 1991. ;.\ r53-e5S 
1151 Celis. J. E.. Rasmussen. H. H.. Olsen. E.. Maestn. P.. L^Ters. H.. 

Honore. B.. Dejfaard. K.. Gromo*. P.. HoiTaanr.. H J.. Nieisen, 

M-. Vissucv. a.. Vmtermyr. O.. Hao. J.. Cehs. A.. Ba»e. B.. Lay- 

ndsen. J. B-. Rau. C. P.. Andersen. A. H.. Malburr.. E.. Kixrpaard. 

1.. Puype. \|„ v«n Damme. J.. Detay. B.. Vanoekerckho^e. J.. £„ v . 

tropnettsit 1993. 14. 1091-1198. 
[16) Celis. J. E.. Madsen. P.. Rasmussen. H. H.. LerTers. H.. Honore 

B.. Gesser. B.. Dejfaard. K... Olsen. E.. Mapnusson. \ . Kit! J 

Cehs. A.. Lauridsen. J. B.. Basse. B.. Rau. G. P.. Andersen, v. 

Walbum. E.. Brandstrup. B.. Pedersen. P. S.. Brandt. N. J.. Puypc* 

M.. Van Damme. J.. X'andekerckhove. J.. Eiearhphorcsts 1991 

802-S72. 

117) Biellqvisi. B.. Ek. K.. Rtfheui. P. G.. Gianazza. E.. Gorf. A.. 

f°" e i; W "" Wcslermeter - R -^- Biochem. Btapnvs. \U tiioui 19S2. 
31*— 333. 

fl81 Bairoch. A.. Boeckman. B.. Sucicic AaJs Res. !*0|. 224*-224« 
(191 Honore. B.. Madsen. P.. Basse. B.. Andersen. A.. Walbum. E . 

Celis. J. E.. LefTers. H.. Sueietc Acids Res. |O90. J$. o692. 
[20] Altland. K.. Electrophoresis 1990. //. 140-14". 
[211 Pemn. D. D.. Dempsey. B.. Serjani. E. P.. pka Prv*heu*ms m, 
m t 0ftan,c Aads Qhd Bases - Chapman and Hall Lid.. London \m 
12*) Pemn. D. D.. Dissociation Constants ot Orxamt Buus in Aquetn 

Solutions, Buueruonhs. London 1965. 
[231 Pemn. D. D.. Dissociation Constants ot Organic Base* in Auuv*u\ 

Solutions. Supplement 1972. Buueruonhs. London 1972. 
[24J Altland. K.. Becher. P.. Rossman. L\. Biellqvisi. B.. E/mrnphwrsit 

1988. 9. 474-485. 

[251 Brown. J. L. Robert. W. K.. / Biol. Chem. 1976. .V/. 1009-1014. 
[26] Persson, B.. Flima. C. Heme. C.. Jornvall. H.. Eur. / Bimitrm. 
1985. IS:, 523-527. 



It 



Beogt Bjellqrtst* 
Bodil Basse 
Eydfinnur Olsen 
Julio E. Celis 

Institute of Medical Biochemistry 
and Danish Centre for Human 
Genome Research. Aarhus 
University, Aarhus 



Witiwi fotau for 



companions c: ;-D %t -a-* 



Reference points for comparisons of two-dimensional 
maps of proteins from different human cell types 
defined in a pH scale where isoelectric points correlate 
with polypeptide compositions 

A highly reproducible, commercial and nonlinear, wide-range immobilized pH 

graaiem uPG) was used to generate two-dimensional (2-Di gel maps of 

L U IT« « 0 ^ nH ? b u eled . proiein5 from noncu * l "red. unfractionated normal 

\t^TtTt^T°^\^' ° nC pr0leins ' common 10 mo « h «™ 
ceil types and recorded in the human keraunocyte 2-D eel proiein database 

mined using narrow-range IPGs. The latter established a pH scale that 

n the first dimension or with duTerent human protein samples. Of the 41 pro- 

knLn co™ n n rt .? CUi ?! d and ex P erimen ^ P' values for polypeptides with 
eX*ri fJ7i P os,uonTne vanan « calculated for the discrepancies between cal- 

gavTa Z^f^lo ^ b " th ! MCSl f0r "mples (paired test! 

?h c^e^ab^^^ indwun. that there is no significant difference between 
l.L. ? , "P*""*™! P' values. The precision of the calculated 

mproved with increased buiTer capacity. As shown here, the widelv available 
fnl i!?, ai !° n ° n P , rei f w seouenc « ^not- * Priori, be assumed to be sufficient 

v " ."If P . VaJue$ because P°«*^nslational modifications, in panicular 
A-terminal blockage, pose a major problem. Of the 36 proteins analyzed in 
^l SlU ^ l8 : 2 ° WCre f0und 10 bc ^""ninally blocked and ofiK'J 
Sted^J.^ « databases. The probabiii.y of ^erminal^locUge 
had ehnl M f £T ^ A * lc " ninal « rou P- Twenty six of the proteins 
h£tJl Li " ^ ^n" McrmmaJ ^^o acids and of these 17-19 were 
b ocked m Pr0lCm$ COnUining other ' v -««nninal groups were 



1 Introduction 

As compared with carrier ampholyte isoelectric focusing 
in VIEF) " lhe apnlicaUon 0( " immobilized pH gradients 
(IPGs) in the first dimension in 2-D gel electrophoresis 
offers improved reproducibility (I) because the nature of 
lhe pH gradient makes the resulting focusing positions 
insensitive to the focusing lime [2] and to the tvpe of 
sample applied [3]. The recently introduced readv-made 
IPG strips |4] seem to be an ideal substitute for the car- 
rier ampholyte gradients, which until now have been the 
most commonly used first dimensions in 2-D gel electro- 
phoresis. The availability of standardized first dimen- 
sions opens the possibility of comparing 2-D gel maps of 
various cell types generated in different laboratories, pro- 
vided that the focusing positions of a number of easilv 
recognizable polypeptide spots common to the cell types 
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in question are known. Even though this approach is 
limited to experiments performed with the same standar- 
dized IPG. the flexibility provided by IPGs allows the 
pH gradient to be adjusted to the requirements "of a par- 
ticular experiment. 

Exchange and communication of 2-D gel protein data re- 
quires a pH scale that is independent of the panicular 
IPG used and by which the results can be described. The 
introduction of carbamylation trains and the relation of 
focusing positions to the spots in these trains repre- 
sented a step forward towards solving the reproducibility 
problem experienced with carrier ampholyte focusing |5). 
Problems associated with the use of carbamylation trains 
were mainly due to lack of temperature control and to 
the use of nonequilibrium focusing conditions. Accord- 
ingly, the pattern variation involved not only the re- 
sulting pH gradients, but also the relative spot positions 
as related to each other and to spots in the carbamyla- 
tion trains. Even though the question of reproducibility 
has. to a large extent, been solved, the carbamylation 
trains are still not ideal as markers because the spots in 
the trains do not represent defined entities but rather a 
large number of differently carbamylated peptides 
having close p/ values. As a result, the spots are targe 
and poorly defined as compared to the ordinary polypep- 
tide spots in 2-D gel maps. 
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Neidhardt etat. [6] defined the pH gradient in 2-D gel 
experiments by pi markers whose p/ values were calcu- 
lated from the amino acid composition. Focusing posi- 
tions of other polypeptides could be predicted from their 
composition but the pA' values needed for the p/ calcula- 
tions were unknown. Various groups employing this 
approach do not use the same pK values [6. 7] and there- 
fore, the p/ values derived in this way cannot be 
expected to describe the variation of the hydrogen ion 
activity. In spite of this fact, it is still possible to make 
approximate predictions of focusing positions because 
the pK values used to define the pH gradient are also 
used to calculate pi values and to predict the focusing 
positions. Errors in pK assignments are therefore com- 
pensated. A pH scale which corretly reflects the variation 
in hydrogen ion activity during focusing should improve 
the precision of the predictions, but this has never been 
implemented with CA-IEF focusing as a first dimension 
in 2-D gel electrophoresis. The main reason for this are 
the problems associated with pH measurements in 
focused gels containing high concentrations of urea. 

IPCs can be described from the concentration variation 
of the immobilized groups, provided that the pK values 
of these groups are known for the conditions prevailing 
during focusing. To avoid measurements on gels, Gia- 
nazza eial. [8] suggested the use of pK values derived by 
addition of determined pA shifts. Recently, direct deter- 
minations of pK differences between immobilized 
groups in IPGs were made by determining phpK values 
in overlapping narrow-range IPGs [9, 10] and the results 
verified the applicability of the Gianaz2a approach. A 
description of the focusing results in a pH scale, which 
correctly describes the variation of the hydrogen ion 
activity for the focusing conditions used, not only allows 
the comparison of 2-D gel maps generated with different 
IPGs, but also opens the possibility for correlating the 
focusing position of a polypeptide with its composition 
19). Experiments by Bjellqvist etaL (9. 10) have implied 
that pH scales showing good correlation between calcu- 
lated and experimental pi values can be derived for any 
of the conditions commonly used for focusing in connec- 
tion with 2-D gel electrophoresis. These pH scales are 
then defined through the pK values of the immobilized 
groups in the IPG containing gel. To be useful for inter- 
laboratory comparisons, however, the pH scale has to be 
defined through pi values of easily recognizable spots 
present in the 2-D gel map. So far. pi determinations in 
a useful pH scale, combined with determinations of pK 
values needed for pi calculations, have only been made 
for the pH range 4.5-6J at 10'C [9). CA-IEF focusing as 
described by OTarrell [11] does not control the tempera- 
ture of the first dimension, which can be expected to be 
slightly above room temperature. With IPGs, the temper- 
ature commonly used is about 20 *C [4, 12) or 25 *C [13] 
and this is a critical parameter that needs to be con- 
trolled [14). 



«nd a focusing temperature of 25C. We haxe us- -or* 
mercia! nonlinear, wide-range IPG strips which g!ie\b 
gel maps that are closely similar to the ones resucmg 
with the CA-IEF technique used to establish the huma- 
kenunocyte database [15). As an initial st- r towards 
imerlaboratory comparisons of results obtained with the 
nonlinear gradient as a first dimension we report her; 
on the focusing positions of 41 known proteins that are 
common to most human cell types. The pH range 
covered corresponds to the range in classical CA-IEF 
2-D gel electrophoresis and in order to use these pro- 
teins as internal standards for comparing 2-D gel maps 
generated with other IPGs we determined their pi values 
with narrow-range IPGs in the first dimension. We have 
compared the calculated versus experimental pi values 
and show that it is necessary to have further information 
(absence or presence and nature of posnranslaiional 
modifications), in addition to amino acid composition to 
be able to calculate pi values thai correspond to the 
actual experimental values. The pA values used for the 
calculations are provided and the usefulness of pi predic- 
tion in relation to database information is discussed. 
Furthermore, we comment on the possibilitv of using 
experimentally determined pi values to verify the avail- 
able database information on polypeptide composition. 

2 Materials and methods 
2.1 Apparatus and chemicals 

Equipment for isoelectric focusing and horizontal SDS 
electrophoresis (Multiphor* II electrophoresis chamber 
Immobiline* strip tray. Muhidrive XL programmable 
power supply. Macrodrive power supply and Multitemp* 
II) was from Pharmacia LKB Biotechnology AB 
(Uppsala. Sweden). Vertical second-dimensional gels 
were run in the home-made equipment described in |15) 
The IPG strips with the wide-range nonlinear pH gra- 
dient were either Immobiline DryStrip' pH 3-10 NL, 
180 mm or alternatively 160 mm long IPG strips- with a 
corresponding pH gradient. In both cases the IPG strips 
were delivered by Pharmacia LKB. Immobiline. Pharma- 
lyte. Ampholine. GelBond as well as PAG film and the 
ready-made horizontal SDS gels (ExcelGel- XL SDS 
12-14 > were also from Pharmacia LKB. Purified proteins 
and peptides were from Sigma (St. Louis. MO). 

2-2 Sample preparation 

Preparation and labeling of unfractionated keratinocytes 
as well as fibroblasts have been described in (16). Cells 
were lysed in a solution containing 9.8 m urea. 2% w/v 
NP-40. 100 mM DTT and 2% v/v Ampholine pH 7-9 



The present work was designed to compare 2-D gel maps 
of different cell types in a laboratory applying both 
CA-itF and IPG focusing at a common temperature. To 
this end we have generated 2-D gel maps of proteins 
irom noncultured, unfractionated normal human epi- 
dermal keratinocytes with IPG in the first dimension 



2J 2-D gel electrophoresis 

First-dimensional focusing was performed according to 
Gorg etaL (2) with some minor modifications, as de- 
scribed in [9J. Rehydration of the IPG strips was made 
in a solution containing 9.8 m urea, 2% w/v CHAPS, 10 
m* DTT and 2% v/v carrier ampholyte mixture. The ear- 
ner ampholyte mixture consisted of 2 pans Pharmalyte 



4-^.5. I pan Ampholine pH 6-8 and ] pan PrurmaJvte 
pH 8-10.5. Usually, caihodic sample application was 
used and the samples were diluted 2-20 times in a solu- 
tion containing 9.8 m urea. 4' 0 w/v CHAPS. IH w/v 
DTT and 35 m.vt Tris base. For acidic application, the 
Tns-base was substituted with 100 m.M acetic acid. The 
degree of dilution and sample volume (20-100 uLi 
depended on the panicular sample and the IPC. and 
whether visualization of the proteins was to be done bv 
Coomassie Brilliant Blue or silver staining. With the 
wide-range non-linear IPG. 10-30 ug of total protein 
was loaded for silver siaining and 100-200 ug for Coo- 
massie staining. Focusing was done overnight with Vh 
products in the range of 45-^0 kVh with 160 mm long 
strips and 50-70 kVh with 180 mm long strips. Solubili- 
zauon of polypeptides and blocking of -SH groups prior 
to the second-dimensional run. as well as loading on the 
second-dimensional gel was done as described in [9] 
The sucking gel was omitted and 5-10 mm were left at 
the top of the second-dimensional gel for applying the 
IPG strip. The space was filled with electrode buffer con- 
taining 0.;«* w/v agarose. Casting, running, siaining and 
autoradiography were carried out as described in [15J. 

2.4 Experimental determination of p/ values 

The determination of the pA* differences between Immo- 
bilincs pA 4.6. pA- 6.2 and pA' 7.0 necessarv for the cali- 
bration of the pH scale at 25 C in 9.8 m urea was done 
as described in [9J with the same narrow-range IPGs 
The pH scale was defined by setting the pA' value of 
Immnhjiinc pA' 4,6 equal to 4.61 [9J and the determined 
PA differences gave the pA' values of Immobilines pA'6 2 
and pA ".0. equal to 5.73 and 6.54. respectively. The pA' 
dmcrenccs lound arc in good agreement with values de- 
rived Irom |I71 and |8) by extrapolation to 9.8 m urea 
concentration. As in (91. jdditional narrou-ranue recipes 
haw nc-n used for determining p/ values. With narrow- 
range IMGs extending to pH values higher than the pA' 
value hi Immobiline pA* *\0. anodic sample application 
ua> u »cd with acetic acid added to the sample solution. 
OihcruKc. caihodic sample application was used with 
the Name sample buffer as for wide-range IPGs. 

2.5 Protein compositions used for p/ calculations 

With the exception of vimcntin. protein compositions 
.»rc irnm the Swiss-Proi database [18]. For vimemin. we 
used the data from [19J. where the amino acid at posi- 
tion is a D instead of a S. Information in the Swiss- 
Prot dataoase on phosphorylation has been disregarded 
because it was known from earlier studies (J. E Celis 
unpuDhshed results) that the spots in question corre- 
sponaid to the unphosphorylaied forms of the peptides 
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different wbsmuents on the o-carbon were taken ir:o 
account. The calculations of p/ values *ere maoe u.- 
the aid of the IPG-maker program [20J. 

17 p£ values used for p/ calculations 

For the carboxyl terminal group and internal giutamvi 
and aspanyl residues the same pA' values were used as in 
[9J. For C-terminal glutamyl and aspanvl residues, sep- 
arate pA* values were derived with the aid of the Tali 
equations [9. 21). The pA' values of histidvl groups were 
calculated from the pi values of human carbonic anhv- 
drase I as in [9). For A-iermina! glycine a pA* value of 
7 50 was used. The pA' shift caused by a substituent on 
the o-carbon was assumed to be identical with the pA 
shift the substituent caused for the amino group in the 
amino acid. i.e. 2.28 pH units were subtracted from the 
pA values for the amino groups in the amino acids given 
in [22, 23J. The approximate pA* value of 9 for the cvs- 
tenyl group was taken from [24J. For tyrosvl and argirivl 
groups we used the pA' values for the amino acids \2Z. 
m For lysyl groups the effect of high urea concentra- 
tion on amino groups was taken into account and 0.5 pH 
units were subtracted from the amino acid pA* value, 
iftese last three pA' values are far from the pH range 
under study and the results found would have been the 
same if lysyl and arginyl groups were assumed to be 
fully ionized while the ionization or tyrosyl groups were 
neglected. A complete list of the pA' values used is given 
in Table 1. 
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2.6 Calculation of pi values 

For the p/ calculations it was assumed that the same pA' 
value could be used for an amino acid residue in all 
polypeptides and in all positions in the peptide except 
(or \. or C-termmally placed amino acids. For the pA' 
values oi the A-terminal amino groups the elTect of the 



2.8 Statistical analysis 

Statistical comparisons of the experimental and calcu- 
lated p/ values were done on an Apple Macintosh Ilsi 
using the statistical package Statistica/Mac, release 3.0b 
(from StatSoft Inc.. Tulsa. Oklahoma). Calculated and 
experimental p/ values were compared by the /-test for 
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correlated samples (paired nest). The normality of pi 
differences was estimated graphically bv probability 
plots. The variances of the data presented here and the 
similar data on plasma and liver proteins in 191 were 
compared by the F-test. 

3 Results and discussion 

3.1 Identification of polypeptides and p/ determinations 

The 2-D gel maps of ["SJmethionine-labeied proteins 
from nonculiured. unfractionated normal human kerati- 



CAlS !^ C H Vilh IhC n0niiniJr - »'«'«n fitf IPG - 
UVIEF pH gradients in the first dimensior ' - >- " u 
in Figs. 1 and 2. respectively. The IPC e\t*-c> v 
pH values but otherwise the two patterns ar; \ 
iiar and most of the spots in the IPG pattern „\r 
directly related to the corresponding spot* ir.":w 
UVIEF gel. To obtain comparable patterns it *a* i nro - 
tant to keep the focusing temperature as similar 
possible. Compared to other studies (I — I. 9. 10. 12-u; 
we increased the urea concentration in the focusing c?l 
to 9.8 M because keratins streaked badly in the focusfnie 
dimension when 8 m urea was used, presumably due to 
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aggregates of acidic and basic keratin*. An misuse in 
urea concemration to 9 m or more eliminated these 
streaks: apart from this effect, no other maior changes in 
the focusing positions were observed. In Fig. ] we have 
indicated the positions of 41 known proteins from the 
human keraiinocyte 2-D gel database that are most 
likely common to most human eel! types. The choice 
was made because these proteins are easv to idemifv 
with certainly. With the exception of stratifin (spot 2) 
involucnn (spot 4) and keratin U (spot 15). which are ail' 
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epithelial markers, these proteins are also presen: ,r 
human fibrobUsts (Fig. 3) ud lymphocytes .results no: 
shown i and therefore can be used as landmarks for com. 
panng 2-D gel maps derived from different cell ivpes. In 
Table : the 41 proteins are listed together with iner 
sample spot numbers (SSP) in the human keratinocvie 
protein database and p/ values determined in :-D *gel 
maps generated with narrow-range IPGs in the first 
dimension. 
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3.2 Comparison bemeen the determined tod calculated 
p/ values for human keratinocyie proteins 

Thiny six of the 4] proteins listed in Tabic 2 are found 
in the Sw»s-Prot database. Contrary to the plasma and 
liver proteins used in [9]. the pi caJcuations on the pro- 
teins used in this study posed some problems that 
reflected the way in which they were characterized. The 
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this study hive «ll been characterized bv internal 
lion occurs with high frequency in eukaryoies. 
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According 10 Brown and Robert [25], proteins with aeety. 
laied A-terminals correspond in weight to approximately 
80%> of the soluble protein in ascites cells. Based on 
results from A-terminai sequencing, at least 40*o of the 
spots in the human liver protein 2-D gel map appear to 
be blocked [3). The corresponding number, derived from 
107 spots in the 2-D gel map of human T-lvmphocvte 
proteins, falls between 60 and 65% (J. Strahler. personal 
communicaiion). Information concerning A^terminal 
blockage is not normally available, and in the Swiss-Proi 
database only 6 of the 36 keratinoeyte proteins are speci- 
fied as A-terminally blocked. We have, within the present 
material, defined 18 proteins for which the A'-terminals 
are very likely io be correctly described. Six of these pro- 
teins are listed in the Swiss- Prot database as A*-iermi- 
naliy blocked, four represent proteins which appear in 
the human liver 2-D gel map and have been A-termi- 
nally sequenced as liver proteins [3] and the remaining 
eight have A-ierminal groups other than M. S and A. /.*. 
v-terminals for which .v-acetylation is uncommon [26]. 
In Figs. 4 a. B. C and D p/ values calculated from Swiss 
Prot database information are plotted against the experi- 



mentally determined p/ values for all the kerj::-^-.- • 
proteins listed in Table 2 and for the IS seie^c -v. 
terns, as well as for the plasma and liver protem> t c_:_ 
from [9] valid for 10*0*. 

The calculations show ihat without knowledge of in? 
status of the A-terminal group, precise predictions of p/ 
values for eukaryotic proteins cannot be achieved based 
on the information available in Swtss-Prot and similar 
databases. However, for proteins where the A -terminal 
status is known, we find good correlation between pre- 
dicted and experimental pi values. When the variance of 
the pi discrepancies and the variance of calculated 
charges at the experimental pi values derived from the 
present data set are compared with the corresponding 



There are lour ptois: .Ai the 3* polvpept.dcs irom nurnwt hunun 
kerattnocytes (no corrertions >. (Bi the it poi> peptides from Fi f . i \ 
where p/ values have been recalculated for 12 polypeptides »nh M. 
5 and A as .V-termmally assumed blocked, based on ..alculjicd 
cftar»e. <C) the 18 selected polypeptides with information on the 
.\-iermmal eonAeuratton. and iD» pbsnu jnd hver protein*. 
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.-allies derived from ihe data on plasma and liver pro- 
:eins in [9) (Table 3). the present data are found to result 
:n larger variances for the values of both p/ discrepancies 
and calculated charge at the experimental p/ value when 
no information on posttranslationaJ modification is 
:aken into consideration. Correction for possible A-acety- 
:auon of 12 polypeptides wiih M. S and A as .V-terminal 
results in a smaller variance of pi discrepancies, al- 
though not significantly different from values derived 
:rom (9). whereas the variance of ihe calculated charge at 
:he experimental p/ value is significantly higher. For the 
18 selected proteins the variance for the pi discrepancies 
s significantly smaller than for the data in [9J; however. 
:he corresponding value for calculated charge at the 
experimental pi value does not improve to the same 
-•xient. This, we believe, reflects another difference 
between the two sets of proteins used for the calcula- 
:ions. Based on spot distributions in 2-D gel maps, the 
iet of proteins used here has a molecular weight distri- 
bution that is more representative of the patterns ob- 
served in mammalian cells. In the study by Bjellqvist 
|9] most of the high molecular weight plasma pro- 
eins had to be excluded due to their unknown content 
)f sialic acid which made the proteins analyzed in this 
;iudy heavily biased towards low molecular* weight pro- 
ems. The buffer capacity of proteins normally increases 
•viih the protein's molecular weight, and the average 
suffer capacity of the presently selected proteins with 
issumed known A'-ierminals is 18 charge units/pH unit, 
vhile the corresponding value for the proteins used in 
9] is only 9 charge units/pH unit. High buffer capacity 
:an be expected to improve the agreement between cal- 
culated and experimental pi values. Inspection of the 
iata presented in Table 2 for the polypeptides with 
issumed known A'-ierrmnals verifies the importance of 
he buffer capacity. For 8 polypeptides having buffer 
:apacities higher than 15 charge units/pH unit, the calcu- 
diioni m ull i-ases yielded p/ discrepancies with absolute 
ulucs of less than O.W pH units. The largest discre- 
pancy. 0.06 pH units, was observed for annexin II and 
taihmin. proteins which have low buffer capacity: 0.9 
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and 6.6 charge units/pH uniu respectively. The pro-- 
biiity that the focusing position of a protein with knoun 
composition will fall within a certain distance from :he 
calculated pi value therefore cannot be predicted b\ the 
variance alone. The buffer capacity of the specific protein 
must be taken into consideration as well. As indicated 
by the decrease of ihe variance of calculated charges ji 
the experimental pi value for the selected proteins, the 
observed improvement can not solely be due to the 
higher buffer capacity of the keraiinocvte proteins. The 
two studies relate to different experimental conditions. 
Good agreement between experimental and calculated 
pi values implies that the proteins are defolded and a 
factor that may contribute to the observed improvement 
is a more complete defolding of proteins caused bv the 
higher temperature and urea concentration used in this 
study. 

The data indicated that the precision with which pi 
values can be predicted for polypeptides with high buffer 
capacity is better than the precision with which experi- 
mental pi values can be determined. If the pH is defined 
through the pA' values of the immobilized groups in the 
IPG containing gel. the precision of the experimentally 
calculated data will depend on the pH difference 
between the pi and the pA' value of the immobilized 
group with the closest pA'. For the present studv this will 
give pi determinations with a precision varving in the 
range of = 0.02-0.05 pH units [9J. The good agreement 
observed between the calculated and experimental pi 
values is due to the fact that errors are mainlv system- 
atic and. as discussed in [% they will largely be cancelled 
out in the calculations. A pH scale defined through the 
presently determined pi values will not necessarily 
reflect the variation of the hydrogen ion activity during 
the focusing step in an optimal way. but it still allows 
precise predictions of focusing positions for polypeptides 
with known compositions, including information on 
posttranslaiional modifications. Calculated net charge at 
the experimentally found isoelectric point defined in •ibis 
scale will serve as a tool to verify that ihe polypeptide 
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composition used in the calculation is correct and com- 
plete. Exceptions to this are proteins such as involucrin 
and heat shock protein 90 that have very high buffer 
capacities. Introduction of an extra charge unit into 
these proteins will only result in p/ shifts falling in the 
range of 0.01-O.02 pH units and the effect is that the 
quality of the pH definition - the precision by which pA' 
values used in the calculations are given and the preci- 
sion of experimental p/ values in these cases - will limit 
the possibilities to verify polypeptide compostion based 
on the experimental pi value. 

Statistical comparison of experimental and calculated p/ 
values was done using the /-test for dependent samples 
and normality of the discrepancies was estimated by 
probability plots. For the 36 proteins, the ^-level is 
0.0021. indicating that a result like this is unlikely to 
be a chance effect and must be assumed to represent a 
real difference. After correction for the most UkcK 
A'-terminal configuration, the p-level is 0.043 and cannot 
be accepted as representing the same population since 
the /Hevel is less than 0.05 - the traditional p-limit of 
statistical significance. For the 18 proteins with a known 
or very likely A'-terminal configuration the /-test gave a 
Hevel of 0.49. which verifies that the experimental and 
calculated p/ values are not significantly different. 

Besides showing that p/ values for denatured proteins 
with known compositions can be calculated with a high 
degree of precision from average pA' values, the results 
also provide strong support for the notion that 
A'-terminal blockage heavily depends on the nature of 
the A'-terminal groups (26). The results seem to indicate 
that with A-terminals other than M. S and A. onlv a few 
proteins have blocked A-terminals (1 out of 10 proteins 
in the present study), while it can be inferred from the 
data presented in Table 2 that a majority of the proteins 
with M. S and A as A'- terminal are blocked. After correc- 
tion for the effect of suspected A'-terminal blockage 
there is only one protein (nucleolar protein B23) out of 
the 36 used in this study, which, in spite of a high buffer 
capacity, has a marked difference of 0.1 1 pH units 
between predicted and determined p/ values (Fig. 4B); 
this corresponds to 3 charge units due to the high buffer 
capacity of this protein. This discrepancy in p/ prediction 
and calculation of net charge at the p/ is probablv not 
due to deficiencies in the database information but 
instead reflects a shortcoming of the model used for p/ 
calculations. Nucleolar protein B23 contains a domain 
extremely rich in aspartic and glutamic acid residues 
(Table 4). in which 26 out of 28 amino acid residues 
from position 161 to 188 are either a D or an E. A calcu- 
lation based on the use of average pA values unin- 
fluenced by the charged neighboring amino acid resi- 
dues cannot be expected to correctly describe the p/ 
value wuh almost half of the acidic groups packed 
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together into a highly negativelv charged ree v - t--.. 
limitation caused by calculations based or. aw:-/-* 
values does not severely limit the usefulness 7: T :V. 
approach since a search through Swiss-Prot snow* 
this type of D/E-rich motif is uncommon, and ;nc cv- 
tence of a highly charged region is immediately arr^^r: 
upon inspection of the ammo acid sequence. 

The quality of the information available in databases, 
especially concerning postradiational modifications, is 
a major problem when the data is to be used for pi pre- 
dictions. The Hevel of 0.043 found for all 36 proteins 
after correction for A'-acetylation. shows that this prob- 
lem is not only limited to \-iermina! blockage and the 
very good agreement found tor the ewhiccn po» pep- 
tides, with assumingly correctly described .V-ierminal 
(Fig. 4C). must be regarded as an exception from this 
point of view. .V-Terminal blockage is gencrallv the main 
problem in relation to p/ predictions for eukaryotie pro- 
teins. Of the 36 keratinocyte proteins analvzed. IS— 20 
are suspected to be A-terminally blocked to proteins blo- 
cked according to Swiss-Prot, 12 proteins with M. S or A 
as A -terminal and assumingly blocked based on the cal- 
culated charge, and two proteins, involucrin and 
nucleolar protein B23. with M as A'-terminal for which 
the data does not allow any conclusion). This is in rea- 
sonable agreement with the conclusions based on the 
A-terminal sequencing data derived in connection with 
2-D gel electrophoresis. A'-terminal blockage can be sus- 
pected for 17-19 of the 26 proteins with M. S or A as 
A-termmal. while only 1 in 10 proteins with other 
A-terminal groups are blocked. The information that the 
frequency of .V-terminal blockage is strongly related to 
the nature of the A'-terminal group will be of some help 
in connection with pi predictions based on database 
information. However, without information from other 
sources, an uncertainly will always remain as to whether 
the A'-terminal charge should be included in the p/ calcu- 
lation. 



4 Concluding remarks 

The data presented here lays the foundation for com- 
paring 2-D gel protein maps of different cell types gener- 
ated with nonlinear, wide-range IPGs in the first dimen- 
sion. The focusing positions of 41 polypeptides common 
to most human cell types have been described in a pH 
scale thai allows focusing positions to be predicted with 
a high degree of accuracy, provided that the composition 
of the polypeptides are known and that information on 
postradiational modifications are available. For poly- 
peptides with a very high buffer capacity, the limiting 
factor is the precision with which experimental pH 
values can be determined rather than the precision of 
the calculations. Possible deficiencies in the pH scale 
description of the variation of the hydrogen ion activity 
has. at least at the present state, no consequences for its 
practical use. The major limitation in connection with 
predictions of focusing positions from polypeptide com- 
positions is the quality of existing data on protein com* 
positions, especially concerning posttranslational modifi- 
cations. Amino acid sequences have been reasonably 
easy to obtain, while posttranslational modifications 
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have been difficult and work-iniensive 10 determine. 
Recent developments in the field of mass spectrometry 
are fast changing this situation and within the next years 
we can expect a surge in reliable data in this area. While 
awaiting this development, verification of correctness 
and completeness of available information on polypep- 
tide composition can be provided by experimental p/ 
values in a pH scale based on the p/ values determined 
in this study. So far. our data cover the pH range below 
pH - The basic pH range covered by NEPHGE as 
first dimension will be covered in forthcoming work. 

Rs::iveC Decrrnorr 29. 199." 
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